Abstract
Against the backdrop of increased attention to semiotic relations in audiovisual translation, this qualitative study examines the subtitling of characters and objects from verbal reference to visual information in the presenting and presuming systems in film. By comparatively analyzing the visual-verbal endophora, exophora, and homophora in the DVD and the YYeTs fansubbed Chinese subtitles of an English film Mission Impossible: Ghost Protocol, it is found that there is a stronger visual-verbal link in the DVD subtitle, in which the visual information functions to generalize unfamiliar world knowledge pertaining to uniquely presumed characters to meet target audiences half way, omit redundant verbal information concerning explicitly presumed characters, and explicitate indefinite lexical items relating to explicitly and implicitly presumed objects. A narrative-friendly subtitle is therefore produced featuring conciseness, high readability, and good comprehensibility. However, the visual-verbal link in the fansubbed subtitle is comparatively weak due to the predominantly employed literal translation in subtitling presented/presumed characters/objects, in which the visual information is detached from the verbal content, leaving viewers with redundant or unspecific verbal messages. Extra processing effort is therefore required for constructing film narratives. These findings elucidate the significance of visual-verbal reference in subtitling presented/presumed participants for the sake of film narratives.
Keywords
Introduction
There has been a steady increase in the flow of papers dedicated to the study of multimodally oriented subtitle translation (e.g., Chen, 2019; Chen & Wang, 2022; Chuang, 2006; Perego, 2009; Pinto, 2018; Qian & Feng, 2021; Taylor, 2016, 2020; Tortoriello, 2011; N. Wang, 2019), which posit that the co-deployment of verbal and visual modes constitutes the basic feature of subtitling. Chuang (2006) and Tortoriello (2011) propose that visual information involves in the meaning-making process in film and constitutes an integral part of the source information in subtitle translation. The necessity for subtitle translation to move away from a purely linguistic approach has also been endorsed by Taylor (2016, 2020). Qian and Feng (2021) examine the English subtitle in a Chinese costume drama in the context of polysemiotics and argue that image and speech paraverbal means must be considered in the subtitling for the sake of the meaning-making process. Perego (2009) puts forward the translation methods that can be adopted to restore visual information in subtitles, including addition, specification, reformulation, and so on. N. Wang (2019) also delineates the translation methods, featuring multimodal consonance between verbal and visual modes, adopted in the English subtitle in the Chinese film Farewell, My Concubine. Pinto (2018) further examines how the three dimensions, that is, textual, diegetic, and sociocultural, influencing the subtitle translation of linguistic varieties can be multimodally erected. Chen (2019) addresses subtitle translation via the integration of multimodal analysis with Halliday’s (1994) Systemic Functional Linguistics. Chen and Wang (2022) focuses on subtitling humor from the perspective of visual-verbal interplay, that is, how the visual and the verbal modes interact with each other.
However, research on the influence of visual-verbal reference, that is, instead of being interpreted in their own right, the visual and the verbal content refer to each other for their interpretation, on the translation of characters and objects (hereinafter participants following the tradition of Systemic Functional Linguistics) in film is still an under-charted area. The current study is therefore premised on the analysis of visual-verbal reference with a focus on characters and objects via examining the Chinese subtitles in an English film Mission Impossible: Ghost Protocol. Participants are among the most salient four elements (the other two are actions and settings) that viewers attend to in their viewing processes (Smith, 2012) and the main perceptual leads for viewers to construct narratives as film unfolds (Tseng, 2013). A narrative is a “temporal sequence” (Metz, 1974, p. 18; italics in original) and a narrative object is a “chronological sequence of events” (Metz, 1974, p. 19). This study therefore aims to reveal the contribution of visual-verbal interplay to the unfolding of temporally related events with the involvement of participants.
This paper begins with an introduction to the theoretical foundations and the theoretical framework. With a view to breaking the linguistic confinement of co-reference, this study incorporates the concept of visual-verbal reference (Baumgarten, 2008) into Tseng’s (2013) model on cohesive reference (identification) in film to formulate the theoretical framework. Then it moves on to examining the two Chinese subtitles—the DVD version and the fansubbed version by YYeTs (人人影视). Finally, it concludes by discussing its analytical findings and presents its research conclusion.
Theoretical Foundations and Framework
Due to the different modes involved in filmic texts, that is, the aural-verbal, the aural-nonverbal, the visual-verbal, and the visual-nonverbal modes (Díaz Cintas & Remael, 2021), the film narrative is never a purely linguistic issue. In this section, the theoretical foundations, including Baumgarten’s (2008) visual-verbal reference and Tseng’s (2013, 2021) cohesive reference in film, are introduced to construct the theoretical framework for this study.
Theoretical Foundations
Baumgarten’s (2008) visual-verbal referential cohesion instantiates the cross-modal reference types and Tseng’s (2013, 2021) model sketches the presenting and presuming systems of participants in film.
Visual-Verbal Referential Cohesion
In filmic text, visual and verbal information are combined autonomously to tell stories, which constitutes visual-verbal cohesion—“one linguistic/visual element is necessary for the interpretation of another from the other mode” (Baumgarten, 2008, p. 11). According to Baumgarten (2008), the relation between visual and verbal information can be described as two parallel strands of information unfolding in time, which are also inextricably connected: whenever a verbal item is referred to a visual image in the communication, the two layers of information are pulled together and generate visual-verbal cohesion, that is, endophora or exophora.
In Baumgarten’s (2008, p. 12) study, endophora, including anaphora and cataphora, only occurs between verbal messages—“anaphoric and cataphoric reference integrate sequentially related verbal parts of the text,” and the visual is relegated to the contextual information. However, as filmic text is a holistic entity with the juxtaposition of visual and verbal information, anaphoric and cataphoric reference should involve sequentially related visual and verbal parts—that is, the temporal sequence in narratives takes credit for both the visual and the verbal. Scene, the film’s smallest dramatic unit (Belton, 2018), is the basic analytical unit to define this sequential relation as a scene is normally set in one location and in a single period of time (Kuhn & Westwell, 2020), in which an action is spatially and temporally continuous. Furthermore, since a scene comprises a single shot or a series of shots (Kuhn & Westwell, 2020) and a shot means a single run of the camera (Kuhn & Westwell, 2020), visual and verbal information in a shot are also sequentially related. Therefore, visual-verbal anaphoric reference is the case where a word or phrase refers back to the corresponding image appearing earlier in the same shot/scene, and visual-verbal cataphoric reference describes an item which refers forward to the image presenting later in the same shot/scene.
As for exophoric reference, Baumgarten (2008) proposes that visual information is the situational context of verbal information when the visual and the verbal information are presented in the same frame. However, as a frame is “a salient or representative still of a shot” (Iedema, 2001, p. 189), visual information is sequentially related to the verbal part in a frame rather than functioning as the situational context. The current study contends that sequence, comprising “a range of contiguous scenes which are linked [. . .] on the basis of a thematic or logical continuity” (Iedema, 2001, p. 189), is the analytical unit to retrieve exophoric reference as visual information in another sequence is the situational context of the verbal in the current sequence. Therefore, visual-verbal exophora refers to the case where the identities of reference items in the present shot are retrieved from the visual context of the situation in the preceding or following sequence. These visual-verbal references contribute to the unfolding of temporally/thematically related events in film narratives as shown in Figure 1 formulated for this study.

Narrative-informed visual-verbal reference in film.
Figure 1 concerns how verbal reference to visual information contributes to film narratives. It shows that visual-verbal anaphora describes a referent which refers back to an image used earlier in the same shot/scene; cataphora sees an item which refers forward to an image used later in the same shot/scene; and exophora refers to the case where the identities of a referent is retrieved from the visually delivered context of the situation in the preceding or following sequence. These visual-verbal reference types serve the narratives in the shot, scene, or sequence as long as there is “a beginning and an ending” of an event (Metz, 1974, p. 17; italics in original).
In addition to endophora and exophora, another reference is homophora (Martin, 1992) in which the world knowledge, helping to construct narratives, exists extralinguistically outside the filmic texts. The identity of an item in visual-verbal homophora is retrieved from the shared image among interlocuters in the definable community. The translator therefore must “make decisions to ensure that he/she finds the words that successfully render the effect of the whole semiotic event in the target language and for the target culture” (Taylor, 2016, p. 224).
Cohesive Reference in Film: A Participant Focus
Tseng’s (2013) model of filmic cohesive reference describes how participants are introduced and retrieved in film. The model includes three sub-systems: the network of [generic/specific] system, [presenting/presuming] system, and comparative system. For the purpose of this study, only the network of [presenting/presuming] system was examined.
According to Tseng (2013), the presentation of participants, that is, the first mention of a reference item, can be realized cross-modally with immediate visual salience or gradual visual salience. Immediate visual salience implies that the visual image of the participant is outstandingly presented onscreen and attracts viewers’ attention instantly, while gradual visual salience refers to the case where characters move toward camera gradually and change the non-salient presentation to salient presentation (Tseng, 2013).
The reference item is presumed uniquely or variably (Tseng, 2013). Unique presuming refers to the identity retrieval of participants relying on the uniqueness of reference items, for example a national flag uniquely presumes a country (Tseng, 2013). Variable presuming can be explicit or implicit. Explicit reappearance is the co-habitation of verbal and non-verbal modes, for example the combination of a character’s visual image and his/her utterance, while implicit reappearance is the mono-modal presuming of participants, for example voice of an invisible speaker or proper name of a character whose image does not appear simultaneously (Tseng, 2013).
Theoretical Framework: Narrative-Informed Visual-Verbal Reference of Participants
The integration of Tseng’s (2013) system of cohesive reference (identification) and the narrative-informed visual-verbal reference in film gives rise to the theoretical framework for this study as shown in Figure 2.

Narrative-based visual-verbal reference of participants in film.
The reference item is presented in two ways: verbal presentation with immediate visual salience and verbal presentation with gradual visual salience. Participants’ presentation begins in a shot and ends in the same shot/scene to construct a complete event in a narrative. The unique way of presuming, visual or verbal, resides in the shared knowledge between information senders and receivers. For the purpose of this study, only the verbally delivered unique presuming is examined. Though the corresponding image is not presented on the screen, visual-verbal reference still exists as homophora evokes recipients’ picturing of the referent. The narrative is therefore self-contained for viewers who share the same world knowledge with the film makers. The identity retrieval of variably presumed participants relies on the recall of participants’ visual information to complete the event in the narrative. Explicit reappearance means the temporally related verbal description and visual depiction of presumed participants are exposed to viewers in the same shot/scene via endophora. Implicit reappearance is realized via exophora in which the visual and the verbal in different sequences are thematically/logically related. For example, the verbal description of a presumed participant reappears in the current shot, while the visual depiction of the participant is provided in a different sequence. The completion of a narrative to this point relies on the visual message in a thematically/logically related sequence.
Subtitling Visual-Verbal Reference: A Narrative Focus
This study’s scrutinized data includes two Chinese translations of a 2011 released English spy film Mission Impossible: Ghost Protocol. It is the fourth installment in the Mission Impossible film series. In the film, Agent Ethan Hunt and his team members race against time to find an extremist to stop him launching a nuclear bomb.
The rationales behind the data selection are twofold. First, Mission Impossible: Ghost Protocol has been ranked as one of the best action movies of the 21st century (O’Falt et al., 2017). It has also been warmly accepted by Chinese audience, receiving 8.3 points in Douban (2021), one of a most renown and popular film review websites in China, higher than all the other five Mission Impossible films. Second, this film is a fast-paced and explosive “thriller with action sequences that function as a kind of action poetry” (Ebert, 2011), which poses the serious challenge for identity retrieval and narrative construction.
This study examined two Chinese subtitles: the DVD version (hereinafter TT1) (9,305 Chinese characters) and the fansubbed version by YYeTs (人人影视) (hereinafter TT2) (11,315 Chinese characters), the largest and top-ranked volunteer translation group in China (D. Wang, 2017). The two research questions are: (1) how do these two Chinese subtitles deal with the presented and presumed characters and objects to facilitate the identity retrieval in film narratives? and (2) what is the function of visual-verbal reference to this point?
Prior to the data analysis, the analytical methods adopted in the current study is provided in the section to come.
Analytical Methods
This qualitative study examines the subtitling of characters and objects in the DVD version and the YYeTs version of the film Mission Impossible: Ghost Protocol, which lasts approximately 2 hours and 10 minutes. First, all instances in which the identification of characters and objects involving visual-verbal interplay in the English dialogue exchanges are annotated. Then, the two Chinese subtitles are examined to identify the linguistic disparities and investigate whether these differences are due to the manipulation of visual-verbal interplay. Last, the function of the visual-verbal interplay is analyzed in order to identify whether the film narrative is promoted or not.
As for examining the subtitling of characters involving visual-verbal interplay, both the presented and presumed characters are analyzed with a view to revealing visual references’ contribution to film narratives. Similarly, the subtitling of presented and presumed objects is investigated in terms of visual-verbal interplay to unravel its contribution to film narratives.
Subtitling Visual-Verbal Reference of Characters
This section concerns how presented and presumed characters are translated in the two Chinese subtitles and how they influence the film narrative.
Presented Characters
Characters are verbally presented by proper names or common nouns. Given that it is a common practice to transliterate proper names to introduce characters for the first time, the current paper solely investigates the subtitling of common nouns to present characters. Three cases have been identified and it is found that the visual information has not been given its deserved attention for the sake of narrating a complete event (TT1 and TT2 in Example 1).
Example (1) [0:15:21.86–0:15:24.35] ST: Armed hostiles. TT1: 是敌人 (It’s enemy.) TT2: 是武装敌人 (It’s armed enemy.)
In a railway station, Agent Brandt sees several men in suit approaching and says “Armed hostiles” to his team members via the wireless communication tool. These men are visually presented in a long shot. This is an anaphoric reference where the ST refers back to the visual used earlier in the same scene. Then these men, presented with gradual visual salience, run toward camera to chase Brandt and later on Brandt was shot to death in an alley outside the railway station. This is a cataphoric reference in which the ST refers forward to the visual used later in the same scene. The narrative completes here with a beginning and an ending. In the TT1, the translation of “armed” is omitted, whilst in the TT2, it is rendered as 武装 (armed) in a word-for-word manner. The TT1 fails to transfer the necessary information—these presented characters are dangerous and carrying pistols covertly; and the TT2 is incompatible with the visual as 武装敌人 (armed enemy) indicates that these people carry weapons overtly.
Then, how to improve the translation to present the characters as they are depicted and described in the film to facilitate the plot unfolding? For the sake of enhancing information acquisition and improving information processing, explicitation, “making explicit in the target language what remains implicit in the source language because it is apparent from either the context or the situation” (Vinay & Darbelnet, 1995, p. 342), can be employed to produce a visual-verbal overlapping information channel to take advantage of both the preceding and following images in the same scene. A suggested translation is therefore 是敌人,有枪 (It’s enemy with pistols), which correlates not only to the image used earlier in the same scene, but interrelates to the image used later in the scene to complete the visual-verbal narrative. This visual-verbal overlapping information provides viewers with an all-around description of the presented character by incorporating the visual depiction into the translation.
Presumed Characters
Uniquely Presumed Characters
The identity retrieval of uniquely presumed characters by means of homophora is based on the world knowledge shared by the film-makers and the viewers. According to Popper (1972), there are three worlds: the physical world, the mental world, and the world of intelligibles. In translating homophora, the “problem is to find a match between [the two] second worlds, i.e., the worldview of the sender of the message and that of the viewer” (Pedersen, 2011, p. 57). If there is a match between the two second worlds, borrowing, “[a] word or expression borrowed directly from another language, in its form and meaning” (Vinay & Darbelnet, 1995, p. 340), can do the trick to make the uniqueness in the ST accessible to the TT receptors and ensure the narrative. If there is no such a match, the most difficult situations arise: when the uniqueness in the ST is out of TT viewers’ worldview (one instance is found in the data; Example 2); or when TT viewers have a different interpretation of the uniqueness (three instances are identified in the data; Example 3). The translator can make the decision either to help construct the narrative by generalizing the uniqueness to fit into TT viewers’ world knowledge (TT1s in Examples 2 and 3); or to expand TT viewers’ worldview by introducing the exotic uniqueness into their mental world via the translation technique of borrowing, which, however, might impede the narrative flow (TT2s in Examples 2 and 3).
Example (2) [1:03:23.19–1:03:27.57] ST: You sure I shouldn’t wear a mask? You know, ‘cause I’m not exactly Omar Sharif. TT1: 你确定我不用戴面具吗
我看着一点儿也不像阿拉伯人
(Are you sure I don’t need to wear a mask? I don’t look like a Saudi Arabia.) TT2: 你确定我不用戴面具吗
因为我可不是奥马尔·沙里夫
(Are you sure I don’t need to wear a mask because I am not Omar Sharif.)
American Agent Benji, a typical Caucasian-looking man, is to disguise as a Saudi Arabian waiter, but there is no such a full mask for him to wear. He is worried as he himself does not look like a Saudi Arabian in any way. As English viewers are able to register the uniqueness of “Omar Sharif” as an Arabian name, the contrast between the verbal content and the image of Benji promotes the plot unfolding. However, Chinese viewers cannot get the uniqueness of 奥马尔·沙里夫 (the borrowing translation of “Omar Sharif”) as an Arabian name, so the narrative is undermined in the TT2. The generalized translation, “in which a specific (or concrete) term is translated by a more general (or abstract) term” (Vinay & Darbelnet, 1995, p. 343), 阿拉伯人 (a Saudi Arabian) in the TT1 contributes to the narrative flow by bringing to the fore that Omar Sharif is supposed not to look like Benji, which completes the visual-verbal narrative for Chinese viewers.
Example (3) [1:26:08.48–1:26:11.20] ST: Three days in we caught wind that a Serbian hit squad was after our couple. TT1: 三天后我们听说
一群外国杀手要来杀他们
(Three days later, we heard that a group of foreign hitmen were coming to kill them.) TT2: 三天后我们得到消息
有一对塞尔维亚杀手要来袭击这对夫妇
(Three days later, we heard the news that a couple of Serbian hitmen were coming to assault this couple.)
In the film, the Serbians are shown as terrorists and assassins. In the TT1, the ST “a Serbian hit squad,” is generalized as 外国杀手 (foreign hitmen) to tone down the uniqueness to fit into the Chinese viewers’ image of the Serbians—friends and companions—for the sake of the narrative flow. In the TT2, the literally rendered 塞尔维亚杀手 (Serbian hitmen) tries to introduce the uniqueness into Chinese viewers’ knowledge frame, which might entail political or ideological concern and further impede the narrative as the image of Serbians in TT viewers’ mental world is not identical to that of the ST viewers’ world knowledge.
The above-examined subtitling uniquely presumed characters reveals that the uniqueness is generalized in TT1 to fit Chinese viewers’ mental world, which promotes film narratives, but it is literally rendered in TT2 to introduce the new world knowledge to Chinese viewers, which causes barriers to narratives as viewers might not be able to recall the same visual information.
Variably Presumed Characters
As for variable presuming, 72 instances of explicit reappearance and seventy-seven instances of implicit reappearance have been identified. Please see Table 1 for the numerical summary of the analyzed data.
Numerical Summary of Subtitling Variably Presumed Characters.
Stands for frequency.
Table 1 shows that explicitly reappearing characters—the presumed verbal text goes along with the corresponding image of the characters in the same shot/scene, are mostly literally translated in TT1 (i.e., 63%), the direct transfer of a source language text into a “grammatically and idiomatically appropriate” target language text (Vinay & Darbelnet, 1995, p. 33), followed by omission (i.e., 25%), generalization (i.e., 9%), and explicitation (i.e., 3%). Literal translation is almost the only translation technique employed in TT2 to render explicitly reappearing characters (i.e., 99%). As for the subtitling of implicitly presumed characters—the presumed verbal text and the corresponding images are presented in different sequences, literal translation has been overwhelmingly adopted in both TT1 (i.e., 89%) and TT2 (i.e., 100%).
The literal translation of presumed characters maintains the redundant relation between the visual and the verbal information. But is it necessary to keep the redundancy in subtitling explicitly presume characters? In other words, does the visual-verbal redundancy produce a grammatically and idiomatically appropriate TT?
In explicit reappearance, the identity of a presumed character is retrieved because of the cohabitation of the verbal description and the visual depiction in the same shot/scene. The temporal sequence between the visual and the verbal information causes immediate visual-verbal cohesion and less cognitive processing effort is required to retrieve the stored information for identification retention (Ghia, 2012). It is further noted that 97% of explicit reappearance (i.e., 82 out of 84 instances) concerns the film’s primary characters, whose images and names are exposed to viewers every now and then. Literal translation, entailing unnecessary repetition and redundant narrative, is therefore not always the best choice.
However, it is another story in subtitling implicitly presumed characters. The identity of a presumed character is retrieved from the verbal description in the current shot/scene and the visual depiction in a different sequence. The thematically/logically related visual and verbal justifies the adoption of literal translation, the provision of an overlapping information channel to viewers for facilitating information recall and constructing film narratives. Furthermore, it is interestingly noted that 100% (i.e., 61 instances) of implicit reappearance in TT2 is employed to presume the film’s supporting characters, whose names and images are much less frequently exposed to viewers than the primary ones.
Objects
This section concerns the subtitling of presented and presumed objects in the two Chinese subtitles with a view to revealing the function of visual-verbal interplay in film narratives.
Presented Objects
English nouns and pronouns, going with immediate or gradual visual salience, introducing objects for the first time were annotated and the two Chinese translations of these lexical items were examined to identify the translation procedures used. Please see Table 2 for the numerical summary of the analyzed data.
Numerical Summary of Subtitling Presented Objects.
Stands for frequency.
Table 2 shows that explicitation is frequently employed in TT1 to render presented objects with immediate visual salience (i.e., 50%), followed by literal translation (i.e., 38%), while literal translation is the most frequently adopted technique in TT2 (i.e., 75%). Only one case regarding subtitling presented objects with gradual visual salience has been identified, and it is literally rendered in TT1 and TT2.
Being exposed to the immediate visual salience of presented objects, viewers are expected to pay attention to the most essential and relevant information to benefit the film narrative. The subtitling of presented objects is expected to serve this purpose by explicitating the specific and necessary visual information to guide viewers’ attention to the key information (TT1 in Example 4) or to synchronize what the audience sees on screen with what the audience reads in the subtitle (TT1 in Example 5) for the sake of narrating an event.
Example (4) [0:20:49.62–0:20:52.57] ST: Love your disguise, by the way. TT1: 你的妆化得不错 (Your make-up is great.) TT2: 顺便 你装得很像 (By the way, you pretend well.)
Benji and Ethan, two American agents, walk in Kremlin Palace, disguising as Russian military officials by wearing Russian military uniforms, fake mustache, fake nose, and fake eyebrows. Looking at Ethan, who is presented in a close-up shot, Benji says “Love your disguise, by the way.” This is a scene-based anaphoric reference between the visual and the verbal. In the TT1, “your disguise” is explicitated into 你的妆 (your make-up) for foregrounding the detailed visual information and directing audience’s attention to the fact that Ethan is disguising to be a different person. In the TT2, “your disguise” is literally rendered as 你装得 (you pretend), implying imitating a person’s manner rather than his external features, which therefore fails to guide viewers’ attention to what they are expected to focus on, that is, the make-up that Ethan wears and the narrative is undermined as TT viewers might raise the question: how can Americans look like Russians.
Example (5) [1:02:55.10–1:02:58.78] ST: The lens might be a little uncomfortable. TT1: 镜头可能会让眼睛有点不舒服 (The camera lens might make eyes a little uncomfortable.) TT2: 镜片可能戴着不太舒服 (It might not be comfortable to wear lens.)
Agent Jean is helping Agent Brandt to wear contact lens, saying “The lens might be a little uncomfortable.” Then it is visually shown in a close-up shot that the lens does auto-focus in the eye for taking photos, implying that the lens is, in fact, camera lens. This special visual effect of auto-focusing is only accessible to film viewers, but not to the characters in the film. The visual and the verbal contribute to a cataphoric reference. The TT1 underscores the communicative situation, which involves the extratextual communication between the action depicted onscreen and the audience (Baumgarten, 2005), by explicitating “lens” into 镜头 (camera lens) to synchronize what the audience sees on screen with what the audience reads in the subtitle for strengthening the visual-verbal narrative. The TT2 镜片 (lens) is a verbatim rendering of the ST and viewers need to make extra effort to correlate the subtitle with the image to construct the narrative.
Literal translation is found to be most applicable to the translation of watchwords in spy movies. In this case, the visual-verbal reference becomes secondarily important (TT1 in Example 6).
Example (6) [0:28:30.89–0:28:31.95] ST: The nest is empty. TT1: 鸟窝是空的 (The bird nest is empty.) TT2: 盒里是空的 (The box is empty.)
Agent Ethan is searching for a file in an archive in Kremlin. He opens a box, which is presented in a close-up shot with immediate salience, and finds that it is empty and says “The nest is empty.” This constitutes the visual-verbal anaphora. The ST “nest” is literally translated into 鸟窝 (bird nest) in the TT1, but explicitated into 盒子 (box) in the TT2. Given that this conversation takes place in a secret mission carried out by American agents in the Moscow Kremlin, the employment of watchwords is conducive to the film narrative, which overweighs the significance of anaphoric cohesive reference.
The afore-analysis reveals that explicitation is the most frequently adopted translation technique in TT1 to render presented objects with immediate visual salience to direct viewers’ attention to the specific and necessary visual elements they are expected to focus on for the sake of film narratives. However, in TT2, literal translation is mainly adopted and viewers need to refer to the visual to retrieve the identity of the presented objects. More processing effort is therefore required as viewers must switch from subtitles to visual images and vice versa (Lee et al., 2013).
Presumed Objects
Sequentially related visual and verbal information in the same shot/scene reappear to presume objects constitutes explicit reappearance, while the reappearance of thematically/logically related visual and verbal information in different sequences gives rise to implicit reappearance. Please see Table 3 for the numerical summary of the subtitling of presumed objects.
Numerical Summary of Subtitling Presumed Objects.
Stands for frequency.
Table 3 shows that explicitation is the most frequently employed technique in TT1 to translate explicitly presumed objects (i.e., 56%) and implicitly presumed objects (i.e., 75%), whilst literal translation is mainly used in TT2 in subtitling both the explicitly (i.e., 75%) and implicitly presumed objects (i.e., 75%).
This finding reveals that the subtitling method for presumed objects does not resonate with that for presumed characters (as shown in Table 2), in which omission is most popularly used in TT1 to render the explicitly reappearing characters (though literal translation still occupies the dominant position) and literal translation is pervasively employed in subtitling the implicit reappearing characters. Then, an interesting question arises: why presumed objects and presumed characters resort to different translation techniques in TT1?
The cognitive research on subtitle processing provides the answer: viewers tend to focus on faces when looking at a picture (Perego et al., 2010). Therefore, the translational loss entailed by the omission of the characters’ names can be instantly filled with the images of characters’ faces. Meanwhile, viewers’ attention is distracted from objects because of the focus on faces. The explicitated presumed objects help to direct viewers’ attention to what the film makers expect them to focus on in both explicitly presumed objects (TT1 Example 7) and implicitly presumed objects (TT1 in Example 8).
Example (7) [2:00:37.68–2:00:39.08] ST: I’m fine, by the way. TT1: 我的手好了 (My hand is fine.) TT2: 顺便说下 我也没事 (By the way, I’m fine too.)
Benji’s friends are sitting around a table and worrying about one friend’s wound on the leg. To attract his friends’ attention, Benji raises his left hand and says “I’m fine, by the way,” implying that his hand has recovered from injury. The TT1 incorporates the visually delivered element into the translation by explicitating the term “I” into 我的手 (my hand). This visual-verbal anaphoric reference benefits the narrative as it enhances the information recall—Benji’s left hand was injured in the earlier part of the film. The TT2 is a verbatim transcription of the ST. Viewers need to resort to the visual channel for the detailed information to reckon what might have happened to his left hand, so “viewers’ reading speed is lowered by their attention being focused on the picture” (Pedersen, 2011, p. 20) and the comprehension of the narrative is slowed down as well.
Regarding implicit reappearance, explicitation is also most frequently employed in TT1 to link the visual and the verbal information, while verbatim rendering is mainly used in TT2 (Example 8).
Example (8) [1:35:19.87–1:35:23.13] ST: In light of our recent efforts, the technology’s . . . TT1: 之前出了这么多问题
我们的设备是不是 . . .
(Given that a lot of problems occurred in the past, is our equipment . . .) TT2:鉴于最近发生的事
这些科技产品 . . .
(In light of what happened recently, these science-technology products . . .)
Agent Benji introduces Agent Brandt a newly designed metal suit that Brandt needs to wear for a mission. The metal suit is in a shiny silver color and can be dragged floating through a computer core. Worried about its reliability, Brandt says “In light of our recent efforts, the technology’s . . .,” in which “our recent efforts” refer to the problems caused by the equipment designed by Benji shown in the previous sequence. In the TT1, “the technology” is explicitated into我们的设备 (our equipment) to help build the narrative between the visual elements in the previous sequence and the verbal information in the current shot, that is, a sequence-based exophoric reference, to underscore Benji’s equipment did bring about dangers to agents. In the TT2, “the technology” is rendered as 这些科技产品 (these sci-tech products), blurring the visual-verbal reference between the subtitle in the current shot and the visual information in the previous sequence, interferes the film narrative consequently.
The analysis of the subtitling of presumed objects reveals that explicitation is the dominant technique used in TT1, no matter whether it is explicit or implicit reappearance, for facilitating identity retrieval by providing semiotic hints to what has happened in the preceding plot. Literal translation is, however, more popularly used in TT2 without taking the referential visual information appearing in the same scene (explicit presuming) or the different sequence (implicit presuming) into account, which somehow interferes the narrative as more effort is required for viewers for identity retrieval.
Discussion and Conclusion
This study examines the Chinese subtitling of presented and presumed characters and objects by incorporating Baumgarten’s (2005, 2008) visual-verbal reference into Tseng’s (2013) model of cohesive reference in film, which provides a multimodally-oriented understanding of how narrative is constructed in the subtitling process. It is found that the visual information of the character in the preceding or following shot/scene should be incorporated in translation via explicitation for the purpose of a complete film narrative. Explicitation is also frequently adopted to render the uniquely presumed participants in the DVD version to meet Chinese audiences half way, while literal translation is more popularly employed in the YYeTs version to introduce new world knowledge to Chinese audiences, which might entail comprehension barriers in film narratives. In explicitly presumed characters, the sequentially related visual and verbal reappear in the same shot/scene, in which the verbal is frequently omitted in the DVD subtitle, but verbatim rendering is mainly used in the YYeTs subtitle. In the case of implicitly presumed characters, in which the verbal information of characters is presumed in the current shot and the corresponding visual information appears in a different sequence, literal translation takes up the overwhelming importance in both the DVD and the YYeTs subtitles to fill up the information gap and lessen viewers’ processing effort to relate what is said in the current shot to what has been shown in a different sequence to construct the narrative.
These afore-mentioned findings bring to the fore the debate on whether subtitle translation should be a redundant one (Lång, 2016) or a non-redundant one (Díaz Cintas & Remael, 2021; Ghia, 2012). Lång (2016) argues that two overlapping information channels enhanced the acquisition of information, while Díaz Cintas and Remael (2021, pp. 146–147) purport that “subtitles are rarely a verbatim and detailed rendering of the spoken text, and they need not be. Since subtitles interact with the visual and oral channels of the film, a complete translation is, in fact, not always required.” The current study contends that redundancy, realized by means of explicitation or literal translation, is applicable to subtitling presented characters, uniquely/implicitly presumed characters for the sake of promoting the narrative flow by integrating necessary and detailed visual information in subtitle. This aligns with the proposal that the incorporation of visual messages into subtitling, whereby “both visual information and source verbal information are verbally conveyed but visually presented through subtitles” (Chen & Wang, 2019, p. 210) enhances the degree of relevance “through aiding viewers’ processing effort” (ibid., p. 194). Non-redundancy, realized via omission or generalization, is applicable to subtitling explicitly presumed characters with the aim of producing concise subtitle for an efficient comprehension of the narrative. After all, the default value for subtitle translation is “reduced content” (Pedersen, 2011, p. 212).
It is the same case for the subtitling of presented/presumed objects. Explicitation and generalization are frequently adopted to translate presented objects in the DVD version as translation is a way to “customize the film text to its addresses” (Baumgarten, 2005, p. 93). Subtitle should attract viewers’ attention to what the original film expects the ST audience to pay attention to. The verbatim transcription in the YYeTs fails to do so and might undermine the narrative. Explicitation is also the most predominant technique in the DVD subtitle to render implicitly/explicitly presumed objects by substituting the vague, indefinite, and ambiguous lexical units with referentially explicit and denotatively precise expressions. The aim is to facilitate the communication between the visual information onscreen and the audience (Baumgarten, 2005) and promote the plot unfolding. Literal translation in the YYeTs subtitle fails to establish the link between the visual and the verbal, undermines identity retrieval, and therefore undermines the visual-verbal narrative. It is hoped that the current study, where underscoring the visual element is a blessing in disguise for subtitle translation (Díaz Cintas & Remael, 2021), can highlight the significance of visual elements for subtitling visual-verbal references in film for the sake of film narratives.
Due to word limits, only the subtitling of visual-verbal reference of characters and objects is examined in the current study. The other two salient elements that viewers attend to in their viewing process—actions and settings—are not touched upon. It is recommended that future research can be undertaken to integrate the study of subtitling characters, objects, actions, and settings for a whole picture of the function of visual-verbal reference in constructing film narratives.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
