ElevenLabs Voice Cloning: Investigating the Transfer of Egyptian Humor Into English-Dubbed Instagram Reels

Abstract

The study investigates the efficiency of ElevenLabs voice cloning software in dubbing Egyptian humor in Instagram reels into English. Furthermore, it aims at examining how the replication of tonal pitch nuances and original actors’ voices influence the translatability of humor into the target language. To examine the transfer of cultural and pragmatic aspects of humor, the study conducts multi-layer analysis for eight reels by adopting a multi-modal framework including Martínez-Sierra taxonomy of humorous elements and Juckel et al. typology of humor techniques. In terms of the preservation of the influence of prosodic cues on humor, an acoustic analysis was conducted for both of the source and target audios using Praat. Two questionnaires were designed to examine viewers’ perceptions toward using voice cloning as a dubbing tool to dub humor, oriented toward two participant groups: native English speakers and Arabic-English bilinguals. It revealed that voice cloning is more efficient in enhancing the tonal consistency in dubbing, particularly in conveying Egyptian performed humor into English compared to conventional dubbing and subtitling. However, further improvements are needed in terms of voice naturality and accuracy in transferring cultural aspects of humor. The analysis also sheds light on translation challenges regarding cultural and linguistic nuances in humor translation.

Keywords

voice cloning Artificial intelligence dubbing humor Instagram reels ElevenLabs

Introduction

Technological advancements in machine translation (MT), machine learning (ML) and artificial intelligence (AI) have triggered audiovisual industry to globalize their products to enter new markets. Various modes of Audio-visual Translation (AVT) including (subtitling, dubbing, voice over, and simultaneous interpretation, etc.) are used to present various audiovisual contents in other languages. However, these modes have various levels of limitation including accuracy, time constraints and financial cost, etc.

In AVT, humor translation has been a problematic issue specially if rendered using both of the main modes used namely, subtitling and dubbing due to its subjective percipience and cultural specificity (Dore, 2019). However, it is more problematic when it is dubbed because the audience only hear the voice of the dubbing actors instead of the original actors’ voices, reducing the emotional intensity of humor in actors’ tonalities (Baños & Chaume, 2009). Moreover, humor bound to a certain dialect is far tricky to be captured into other languages (Zabalbeascoa, 2005). For instance, Egyptian humor has its own cultural characteristics shaped out of the intervention of colloquial linguistic features, social culture, economy and political ideology, etc. Such Egyptian-bound features of humor pose a significant problem when they are to be dubbed or subtitled into another language, such as English. Egyptian dialect also gains further popularity especially in film and television production as Egypt pioneers in this aspect in the Arab region, especially its comedic productions which enjoy wide audiences in other parts of the Arab world. Therefore, it is difficult to dub this dialect into another language or dialect without a loss in the intended humor (Al-Abbas & Haider, 2021).

In its turn, the new emerging advanced AI and deep learning-based technologies such as deep fake and Voice Cloning have proved a great significance in audiovisual by mimicking human voice (Gambier, 2008). This technology uses pre-trained voice reference datasets to create a synthesized voice that closely matches the original human voice. Many movie production companies now can dub their cinematic products such as movies, TV shows, documentaries etc., into other languages using the cloned voices of the original actors to give their audience a more immersive experience in watching their favorite TV shows and movies. Therefore, voice cloning should be investigated as a new mode of AVT (Williams, 2024). This offers a brand-new creative method to reflect humor in dubbed audiovisual materials across various languages and cultures by replicating a voice’s subtle nuances including tone, timbre, and intonation, etc. instead of the unnatural conventional dubbing.

Therefore, this study aims at investigating the use of voice cloning in dubbing Instagram reels of Egyptian comedy shows into English. Moreover, the current study also examines the extent to which the sense of humor and cultural aspects in Instagram reels of Egyptian comedy shows is professionally dubbed into English using ElevenLabs voice cloning, focusing on the replication tonal nuances of actors’ voices. It also aims at assessing the extent to which the cultural aspect of humor is preserved and effectively conveyed by featuring the translation strategies the content creator has opted for. Most significantly, the study strives to investigate audience perceptions, precisely English natives and Arabic-English bilinguals, toward AI-assisted dubbing using voice cloning in by examining its performance terms of humor transfer compared to conventional dubbing and subtitling. Therefore, the current study aims at answering the following questions:

To what extent does ElevenLabs voice cloning software efficiently work in dubbing Egyptian comedy Instagram reels clips into English?

To what extent is the sense of humor in Egyptian comedy shows professionally dubbed into English using ElevenLabs voice cloning?

What are the audiences’ attitudes toward the efficiency of voice cloning in dubbing Egyptian reels into English?

Background

Díaz-Cintas (2009) put out dubbing as replacing the original soundtrack of the source language with a soundtrack of the target language to cast audiovisual in a way that fits the target audience culture and norms. However, dubbing is not limited to providing a translation with post-production synchronization. Another more recent and comprehensive definition of dubbing is stipulated by Chaume (2020, p. 104) as “a linguistic, cultural, technical and creative team effort that consists of translating, adapting and lip-syncing the script of an audiovisual text.”

Dubbing was first introduced as an AVT modality as result of the emergence of talking movies, commonly known as “talkies” (Chaume, 2020). This mode has gained popularity in some countries over others. This can be attributed to several factors including economy, technology, audiences’ cognitive recipience of languages, and political agenda. For instance, due to the high rate of illiteracy, most people couldn’t read (Chaume, 2020). Therefore, film production industries found switching to dubbing, though it is more costly than subtitling, a must to globalize their audiovisual products to a wider audience of various educational backgrounds (Chaume, 2020).

The relatively high cost of dubbing is attributed to its complex process. According to Chaume (2020), dubbing requires more than professional voice actors and actresses recording their voices in a studio. An audiovisual product goes through four steps before the dubbed version is released. First, the script of the audiovisual product is translated into the target language. The script is usually translated literally by a translator. Then, a dubbing translator adopts the raw translation to make it more natural to fit the target audience dialog, making sure that the dubbed recordings are properly synchronized in terms of the duration of the of soundtrack’s “takes” (Isochrony), lip movement (lip synchronization), and body language of the screens’ actors (Kinesic synchrony). The recent technologies of speech synthesis provide new potential that could be a game changer in AVT, in general, and dubbing in particular. Voice cloning is definitely one of the new speech synthesis tools that may revolutionize dubbing audiovisual content cross languages and cultures (Chaume, 2020).

Speech Synthesis and Voice Cloning

Speech synthesis is referred to as “a process of automatic generation of speech by machines/computers” (Balyan et al., 2013, p. 57). This voice technology aims to enable machines to act intelligently in producing natural, human-like, synthesized voices that integrate and revolutionize other disciplines and fields. The synthesized voice is generated by analyzing and simulating the acoustic properties of the human voice, breaking it into smaller vocal units such as phenoms (Klatt, 1987).

Voice cloning is one of the newcoming technologies in speech synthesis that appears as a result of fine-tuning AI Deep Learning neural networks of speech synthesis on samples of human voice. It is a highly personalized branch of speech synthesis that replicates human voice with its complex nuances and embeddings (Ma et al., 2024). Napolitano (2020) describes voice cloning as a “mask” of the invariable structure of text-to-speech synthesis. The advent of multilingual, multi-speaker and end-to-end neural networks have enabled speech synthesis models to generate cloned voices by identifying voice embeddings of a certain person voice through what is called speaker encoder (Jemine, 2019). Then, a speaker profile is made to create a clone that closely resembles the natural human voice of that speaker (Napolitano, 2020), without the burden of training models on mass labeled recorded data. The quality of synthesized voices, precisely personalized cloned ones, has drastically improved due to the use of new models that learn and train data on specific audio samples for a person rather than assembling and sorting vast amounts of data (Hu & Zhu, 2023).

Voice Cloning as Means of AVT

Voice cloning now has been integrated into various aspects of life including entertainment, education, communication, and media accessibility. Voice cloning proves great significance in AVT as an inclusive practical solution in terms of time and cost of production (Thomas, 2024). The wide proliferation of social media platforms alongside global availability of the internet, has also increased user-generated audiovisual various contents that usually orients a global audience around the world (Maksymchuk et al., 2023). Therefore, voice cloning applications facilitate the process of both creating content and providing it with a more innovative, immersive real-time AVT, especially user-generated content (Hatami et al., 2024).

In other words, voice cloning, in its turn, gives extra credit to the existing tendency toward dubbing by using the original voice of the speaker rather than using a voice talent from the target language. Using the cloned voice of the original speaker, viewers may feel that dubbed content is originally created in their first language. Unlike conventional dubbing, voice cloning also transfers the prosodic features of the original voice that covey certain pragmatic meanings, more precisely humorous meanings.

According to Misener (2017), the Canadian startup company Lyrebird is a practical voice cloning software that replicates voices, first demonstrated by cloning former U.S. President Donald Trump’s voice in 2017. By training on speech recordings, Lyrebird’s software can recreate voices using just 1 min of audio. Users can create personalized “Lyrebird Avatars” for various purposes, such as chatbots, audiobooks, and video games. Lyrebird also supports ALS patients by helping them preserve their voices through “Project Revoice,” a non-profit initiative that uses deep learning to clone their voices, allowing them to communicate in their natural voice instead of a generic one (Project Revoice, 2019).

A recent study conducted by Williams (2024, p. 132) simply describes the process as “a voice is cloned by training a machine-learning algorithm, most often a deep neural network, using existing audio recordings to learn the unique characteristics of a voice.” Williams (2024) discusses the potential of voice cloning in film making, featuring the potential of voice cloning in replicating the voice of deceased individuals and bringing their voices back to life with their distinguished prosodic features. For instance, he reflects on using voice cloning in replication the voice of Anthony Bourdain, a deceased American author, chef, a travel documentarian who died in 2018, for filming a documentary on his career journey called Roadrunner in 2021. His AI-generated voice is integrated with the film as a voiceover that is synched with his mouth and facial expressions.

There are a few models the tackles voice cloning using fine-tuned neural network in text-audio data including Deep Voice 1 (S. Arik et al., 2017), Deep Voice 2 (Arik et al., 2017), Tacotron (Wang et al., 2017). For instance, the multi-speaker generative model proposed by S. Ö. Arik et al. (2018) is based on fine tuning the model with a set of audio-text pairs. Then, for voice cloning, speaker encoder is used to extract unseen speaker embedded characteristics such as accent and gender. Visual voice cloning is also an innovative voice cloning model which converts text to speech with a certain desired voice characteristic using audio reference. It derives various emotions of the audio from a video reference. This model is used to develop new synthetic speech apps and online software (Chen et al., 2022).

ElevenLabs

ElevenLabs is an AI audio research and deployment company which aims at making content universally accessible in any language and in any voice (ElevenLabs, 2024). Initiated in 2022 by Piotr Dąbkowski, an ex-Google machine learning engineer, and Mateusz Staniszewski, an ex-Palantir deployment strategist (ElevenLabs, 2024). Currently, the software supports 32 languages. ElevenLabs online software houses nine AI-driven voice tools including TTS, STS, Text to SFX (special effects), voice cloning, dubbing studio, voice-isolator, voiceover studio, voice library, and audio native. The software offers some of these tools for free; however, with limited potential in terms of number of characters in text to speech synthesis and the number of voices for cloning.

Humor in AVT

Zabalbeascoa (2020, p. 669) provides a comprehensive definition of humor as “a quality of a statement, action or situation that makes a person laugh or smile or feel similarly amused; consequently, a sense of humor is the ability to recognize, appreciate and sometimes produce such actions, situations, and statements.” Humor is very problematic in terms of linguistic and cultural transference. Just as the case in literary translation, humor’s interdisciplinary nature poses a problematic issue for translators. This is evident in the wide scope of theoretical approaches proposed for tackling humor in translation including Semantic-Script Theory of Humor (Raskin, 1985); The General Theory of Verbal Humor (Attardo & Raskin, 1991); Relevance Theory (Sperber & Wilson, 1986); Skopos theory (Vermeer, 1989); and verbally expressed humor (Chiaro, 2024), etc. Each theory classifies humor based on its own research perspective. Transferring humorous audiovisual contents adds to the already constrained and multi-semiotic nature of AVT (Dore, 2020). Precisely, transferring humor-based audiovisual content such as sitcoms, play, stand-up comedy, and comedy films takes higher regard in terms of preserving or reconstructing humorous patterns for the target audience (Zabalbeascoa, 2005).

In AVT of humor, translators should consider the cultural social background of the target audience, the linguistic and dialectal aspects and audience’s cognitive perspective, etc. Therefore, translators tend to adopt functional approaches such as “Text Analysis in Translation” (Nord, 1988/2006; 1997); Skopos (Vermeer, 1989/2012); Nida’s dynamic equivalence (Nida, 1964, p. 159) to transfer humor in comedy AV content.

Prosody of Egyptian Humor

Arabic preformed humor is often conveyed through prosody rather than lexico-grammatical content, particularly when dialects are involved. While Modern Standard Arabic is used in formal contexts, Colloquial Arabic encompasses regional dialects used in social communication, including films and social media (Nassif, 2021). Egyptian comedy, in particular, enjoys wide appeal due to the unique humor encoded in its dialect, with Egyptians being known as “ibn nukta” (son of the joke). Egyptian humor heavily relies on personal characteristics such as body language, facial expressions, and prosody, making it difficult to capture effectively in languages like English (Shehata, 1992, p. 76).

Previous Studies

There are few studies that tackle voice cloning dubbing in the world in general and in the Arab world in particular. Regarding the integration of the AI as tool in AVT in general and in cross-lingual dubbing in particular, Sanyal et al. (2024) conduct an experiment to evaluate the ability of their proposed AI model for speech synthesis to clone voice embeddings, expressions, emotional tone of the actors from Indian into English. The study further investigates the capacity of the proposed model in distinguishing the background noise and music of the movies and transferring it in good quality. The researchers propose a speech synthesis model with two stages of voice modeling. The first stage involves a speech extraction which consists of a Denoiser and spleeter. These components are of great significance since they distinguish and separates vocals from the background music. The data of the study is collected from over 50 Bollywood movies of various genres such as action, comedy, romance, and thriller. Involving various genres, the data aims at capturing wide range of emotions and storytelling styles.

Most importantly, they adopt a manual transcription and dubbing method because automated transcription and dubbing lacks accuracy, especially when dealing with diverse accents, tonal patterns and background noise. To evaluate the performance of the model, they employ both qualitative and quantitative methods. For qualitative analysis, Mel Cepstral Distortion metrics and the Relative Fréchet Distance metrics are used to evaluate the preservation of expressive aspects of the dubbed speech in terms of emotional content and naturalness. Quantitative analysis is employed to evaluate the performance of the dubbed speech in terms of clarity, intelligibility, and overall audio quality. The quantitative analysis reveals that the proposed model consistently excelled in preserving speaker expression and reducing background noise, outstanding existing state-of-the-art models. The results of STOI metric and Relative Frechette distance also reveals that the model is also efficient in preserving the embeddings, expressions and tonal patterns of the original Hindi audio when dubbed into English.

Federico et al. (2020) propose an enhanced text-to-speech (TTS) pipeline for multi-lingual dubbing. To test its performance, the researchers conduct an experiment to evaluate the automatic dubbed Ted talks videos from English into Italian. The pipeline integrates improved machine translation to control the length of translations, ensuring optimal alignment with the original audio’s timing and segmentation. It employs a neural TTS system consisting of a Context Generation module and a Universal Neural Vocoder. The Context Generation module is trained on speaker-specific Italian voices, while the Vocoder is pre-trained on a large dataset of voices. To synchronize the length of the Italian speech with the original English audio, the system resizes the output accordingly. Additionally, the U-Net architecture is used to separate speech from background audio, allowing the latter to be reverberated into the Italian dubs. An evaluation of voice naturalness was conducted on 24 selected clips. The participants, both Italian natives and Italian, focus on the resemblance of the dubbed speech to human speech in terms of acoustics and synchronization. The results highlight that while the pipeline performs a good synchronization at the phrasal level, prosodic alignment negatively impacts fluency and prosody. These disfluencies significantly affect Italian listeners, although non-Italian listeners perceive increased naturalness due to the enhanced audio rendering with background noise and reverberation.

Pérez et al. (2021) examine the use of text-to-speech synthesis systems at the University of Politècnica de València (UPV) to investigate the integration of such systems in educational institutions. The study focuses on training TTS models on UPV’s repository of educational videos, MediaUPV, to enable cross-lingual voice conversion for Spanish educational content. MediaUPV contains poliMedias (high-quality short recordings by UPV lecturers) and poliTubes. However, only poliMedias are used due to their consistent quality and simplicity. The baseline system aims to produce Catalan and English auto-dubbing for these Spanish videos. To evaluate the system, 98 UPV academic staff members participated in recording clean speech data during the 2016 to 2017 and 2017 to 2018 academic years. All participants were native Spanish/Catalan speakers, with an average age of 50, and evenly distributed by gender. Participants recorded at least 300 sentences in Spanish, Catalan, or English, with a minimum of 150 in each language, under controlled acoustic conditions. A set of 8,820 speech samples synthesized by Tacotron2-UPV was assessed by 47 lecturers, based on naturalness, speaker similarity, realism, and survey feedback. The qualitative analysis reveals that Catalan and English score high rate in terms of naturalness, speaker similarity, and realistic cloning. These results support the researchers’ conclusion that TTS technology is mature enough for massive machine dubbing of educational videos, even in cross-lingual cases.

In the field of video game localization, voice cloning technologies also prove great significance. Nițu (2024) investigates the application of AI-driven automatic text translation and voice dubbing from English to Romanian in video games. The study employs ElevenLabs’ TTS model to clone the original voices of the actors of two games, that is, Pokémon FireRed and Fallout 4. The findings of the study reveal that the potential of AI-cloned voices in capturing vocal characteristics in offering “a unique perspective on the game’s story and characters which adds to the cultural value of localization beyond a simple translation” (Nițu, 2024, p. 7). The researcher concludes that incorporating AI voice cloning software such as ElevenLabs can add to the playability value of the game.

Methods and Procedures

Data Collection

To investigate the role of voice cloning software, precisely ElevenLabs, in recreating humor of various Egyptian comedy shows into English, the data, that is, Instagram reels, was obtained from two Instagram accounts: @vinga_23 and @motat.altaleem. The corpus of the study consisted of 65 reels which were quoted from different Egyptian plays and movies. However, the sample of the study consisted of eight representative reels that were purposefully selected to fit the study’s scope and to avoid repetitiveness as the majority of the reels share the same prosodic features and humorous effects. In other words, the criteria of inclusion for these reels were related to answering the questions and the objectives of the study. In addition, the inclusion criteria included only the reels that are more popular in the Arab world and those that have achieved high percentage of watching and preferences. Moreover, the study selected only the reels that have strong matching voices with the original ones to support the hypothesis of the study. While the exclusion criteria included the unpopular reels in the Arab world and those that have weak matches with the original ones.

These clips were downloaded using free website savefrom.com. For the acoustic analysis, audios were extracted from the aforementioned reels in WAV (Waveform Audio Format) form using Convertio free website. These Instagram reels were created from various Egyptian comedy shows including plays, such as School of Mischief “Madrast Al-Mushaghebeen,” 1973; No Longer Kids “El-Al-Eyal Kebret,” 1979; The Married Couples “Al-Motazawegoon,” 1976. The researchers used their own IG account to collect the data.

The study faced some limitations regarding different parts of the study. A key limitation was concerned with data collection. The data of the study were available on Instagram obtained from two accounts @motat.altaleem and @vinga_23. However, great portion of the data was lost as one of the accounts, @vinga_23 is no longer available on Instagram. Furthermore, the researchers found difficulties in contacting the content creators about detailed information concerning the use of ElevenLabs in cloning the content. Moreover, the accessibility of information about all ElevenLabs tools is limited for premium version subscribers. Most importantly, the cultural sensitivity that surrounds Egyptian humor added to difficulties of the analysis. Furthermore, the lack of literature on this area, especially concerning dubbing from Arabic into English presents a key limitation in the research. Most importantly, due to time constraints, the analysis of the study addressed few samples of the data.

Questionnaires

The participants of the study were 22 persons, 11 native speakers of English (eight females and three males) who are studying Arabic for non-native speakers in the Language Centre at Yarmouk University, and 11 English-Arabic bilinguals (six males and five females) who have MA in translation studies. The participants were orally informed about the objectives of the study and they were asked if they had the desire to participate in the questionnaire. All the participants showed their consent and happiness to participate in the answering the questionnaire. They confirmed that there is no harm or risk for participating in the questionnaire. Therefore, the questionnaire was made in a friendly and comfortable atmosphere to guarantee real results and effective participation. To analyze participants’ perceptions toward the use of voice cloning in dubbing Egyptian comedy in English, two questionnaires were designed. The first questionnaire was oriented toward 11 mixed-gendered English natives, eight females and three males. The questionnaire consisted of 18 questions labeled under seven categories. Six of these sections are obligatory including general perception of dubbed reels, voice cloning, and authenticity in the dubbed reels, comedic timing, and delivery, cultural understanding and adaptation, comparison between the dubbed and subtitled comedic content. Another section was added for the demographic information. The last section contained two questions related the participant’s familiarity with Egyptian humor and watching dubbed content. Similarly, another questionnaire was designed for 11 mixed-gendered English-Arabic bilinguals (six males and five females). The questionnaire consisted of 17 questions labeled under the same six obligatory sections provided in the first questionnaire but provided with original Arabic content as a reference. Furthermore, it consists of six sections to bring insights on their experience and preferences in terms of dubbed and subtitled content. Each section of the two questionnaires is hyperlinked with the dubbed reels, and the corresponding original Arabic content with English subtitle. The participants assessed the role of voice cloning in dubbing in terms of naturalness, authenticity of voices, the preservation of emotions and tone of the original, cultural understanding and adaptation, and the comedic timing and delivery. To check the validity and reliability of these two questionnaires, they were presented to and approved by a jury consisting of two professors of translation.

Procedures

The qualitative approach was more consistent with the study objectives. The analysis comprised three stages. The first stage was concerned with collecting the data from the two aforementioned IG accounts. Then, takes of original Arabic contents with the English subtitle were obtained via Netflix for participants comparative assessments. The opted sample for descriptive analysis was manually transcribed by the researchers for both the source Arabic comedic and English-dubbed reels. Then, the scripts were descriptively analyzed based on multimodal theoretical framework. The second stage included the acoustic analysis of humorous prosodic cues. As explained above, audios were extracted from eight reels in WAV form via free website called Conversion. Audio samples were analyzed through Praat, an open-source tool for speech synthesis analysis. The third stage involved the analysis of participants’ perceptions and induvial assessment in terms of naturality, speaker similarity, comedic delivery, trimming and cultural adaptation and the perseverance of humor tone of the original Arabic content.

To ensure that reliability and validity of the data collected analyzed, a jury of three professors of translation evaluated the questionnaire and their suggestions were taken into consideration. In addition, the reels were coded by ElevenLab’s tool, so the researchers took these reels as they are coded without making any amendment. The researchers trained well on using Praat Software in order to depict the prosodic features of the Source and Target voices. They also consulted a professor specialized in phonology to validate the matching of source and target voices.

Theoretical Framework

For the descriptive analysis, a multimodal framework was constructed to answer the study questions. For the categorization of the humorous elements in both the original Arabic content and the English dubbed reels, the study adopted Martínez-Sierra’s (2006) taxonomy of humorous elements. Furthermore, Juckel et al.’s (2016) humor typology is adopted for grouping humor into four macro categories in AVT. Each category contains sub techniques of humor. Martínez-Sierra (2006, pp. 290–291) classifies the following eight taxonomies of humorous elements:

Community-and-Institutions Elements refer to cultural or intertextual features that are rooted and tied to a specific culture.

Community-Sense-of-Humor Elements, the topics of which appear to be more popular in certain communities than in others, an idea that does not imply any cultural specificity, but rather a preference.

Linguistic Elements are based on linguistic features. They may be explicit or implicit, spoken or written.

Visual Elements comprise a differentiation between the humor produced by what we can see on the screen and those elements that in fact constitute a visually coded version of a linguistic element.

Graphic Elements: This type includes the humor derived from a written message inserted in a screen picture.

Paralinguistic Elements. This group includes the non-verbal qualities of a voice, such as the intonation, the rhythm, the tone, the timbre, the resonance, etc., which are associated with expressions of emotions such as screams, sighs, or laughter.

Non-Marked (Humorous) Elements represent miscellaneous instances that are not easily categorized as one of the other categories but are, nevertheless, humorous. They may have either an acoustic or a visual form, and can be either explicit or implicit.

Sound Elements. They are sounds that by themselves or in combination with others may be humorous. They are explicitly and acoustically found in the soundtrack and the special effects when these contribute to the humor.

Juckel et al. (2016) propose a comprehensive typology for humor in sitcoms. The taxonomy divides humor into four categories including language, logic, identity and action. Each category includes a group of humor techniques. However, to address the problem of the study, action techniques were not included in analysis (Figure 1).

Figure 1.

Juckel et al.’s (2016) proposed typology for humor.

Findings and Discussion

Introduction

The current chapter presents the findings and discussion of the role of voice cloning in dubbing Egyptian humor into English in Instagram Reels. To evaluate the validity of the study’s hypothesis, the study conducts a mixed-method analysis. The first layer of analysis employes a multimodal framework. The framework comprises of Martínez-Sierra’s (2006) taxonomy of humorous elements, and Juckel et al. (2016) typology of humor techniques. Each instance of humor in the data sample is analyzed in terms of the categories of humorous elements, humor techniques, and the subtitling strategies adopted in conveying these instances into English. Featuring the role of prosodic cues in recreating humor in the target language, the study conducts an acoustic analysis for both the source and target audio using Praat. The second layer of analysis taps into viewers’ perception regarding the efficiency of voice cloning in conveying Egyptian humor into English-dubbed reels.

Analysis of VC Performance in Conveying Egyptian Humor in the English Dubbed Reels

Context: The interlocuters, Masoud and Lina, a newlywed couple are talking about the soup and AL-Molokhia (Jute mallow) dish that Lina is supposed to make with the chicken that Masoud brought earlier (Table 1).

Table 1.

Example (1) a dialogue between Lina and Masoud, The Married Couples, “Al-Motzawjoon.”

Source excerpt	Target excerpt
Lina: مسعود‎ Masoud:نعم‎ Lina: ما تزعلش مني لو قلتلك ملقيتش صفحة الشربة في كتاب ابلة النظيرة‎ Masoud: يعني ايه؟ ما عملتيش شربة؟‎ Lina: لا وانا هعمل منين معرفش الطريقة‎ Masoud: ايه يا لينا ايه يا لينا الفرخة تسخينها بالميه تبقى الشربة اخص عليكي‎ Lina: معلش ما تزعلش‎ Masoud: مزعلش اييه بس ده انا قاعد هموت‎ Lina: الفرخة جاية بقى انا هعملها‎ Masoud: الفرخة الجاية ايه يا شيخه انتي بتحلمي الفرخة الجاية ! الشوربة يا ماما ده الشوربة..... امري لله يلا مش الملوخية معمولة من الفرخة دي امتع شي ء في ياا سيدي‎ ادوء ولا اشرب؟‎ Lina: تدوء الأول وتقولي رأيك ايه‎ Masoud:رأيي معروف رأيي معروف يا ست الكل يا احلى زوجة في الدنيا كان نفسياعزم اصحابي مش يأكلوا يتفرجوا‎ ......بس لا الملوخية أصلهاLina: !ة‎حلو Masoud: انت عملتيها ازاي‎ Lina: زي كتاب ابلة النظيرة تمام كلمة كلمة‎ Masoud: ده كتاب نظيرة؟‎ Lina: هو ده‎ Masoud: صفحة الملوخية؟‎ Lina: صفحة الملوخية‎ Masoud: مش ممكن دي صفحة الوفيات يا لينا‎	Lina: Masoud. Masoud: Yes. Lina: Don’t be upset with me if I tell you I couldn’t find the soup page in Abla Nazira’s book. Lina: What do you mean you didn’t make it? Lina: No. How could I? I don’t know how. Masoud: Why, Lena? Why? If you heat up the chicken, the water turns into soup. Lina: I’m sorry. Don’t be upset. Masoud: How could I not be upset? I was looking forward to it. Lina: Next chicken. Masoud: What are you talking about? You’re dreaming. The soup is the most delicious thing. . . . Fine. Did you make AL-Molokhia with chicken ? I just eat it like that. Oh, what a beauty ! Masoud: Should I taste it first, or drink it? Lina: Taste it first, then tell me what you think. Masoud: It’s obviously going to be amazing ..The most beautiful wife in the world.. I wish my friends were here. . .not to eat, but just to watch. Lina: Yeah Masoud: Because the Molekhia. . .<pause> Lina: Delicious?! Masoud: How did you make it? Lina: Just like Abla Nazira’s book said exactly word for word. Masoud: Nazira’s book?! Lina: Yes, it is. Masoud: Is this the Molekhia page? Lina: Yes It’s the Molekhia page Masoud: Impossible. This is the obituary page. Lena!

Source excerpt

Target excerpt

Lina: مسعود‎
Masoud:نعم‎
Lina: ما تزعلش مني لو قلتلك ملقيتش صفحة الشربة في كتاب ابلة النظيرة‎
Masoud: يعني ايه؟ ما عملتيش شربة؟‎
Lina: لا وانا هعمل منين معرفش الطريقة‎
Masoud: ايه يا لينا ايه يا لينا الفرخة تسخينها بالميه تبقى الشربة اخص عليكي‎
Lina: معلش ما تزعلش‎
Masoud: مزعلش اييه بس ده انا قاعد هموت‎
Lina: الفرخة جاية بقى انا هعملها‎
Masoud: الفرخة الجاية ايه يا شيخه انتي بتحلمي الفرخة الجاية ! الشوربة يا ماما ده الشوربة..... امري لله يلا مش الملوخية معمولة من الفرخة دي امتع شي ء في ياا سيدي‎
ادوء ولا اشرب؟‎
Lina: تدوء الأول وتقولي رأيك ايه‎
Masoud:رأيي معروف رأيي معروف يا ست الكل يا احلى زوجة في الدنيا كان نفسياعزم اصحابي مش يأكلوا يتفرجوا‎
......بس لا الملوخية أصلهاLina: !ة‎حلو
Masoud: انت عملتيها ازاي‎
Lina: زي كتاب ابلة النظيرة تمام كلمة كلمة‎
Masoud: ده كتاب نظيرة؟‎
Lina: هو ده‎
Masoud: صفحة الملوخية؟‎
Lina: صفحة الملوخية‎
Masoud: مش ممكن دي صفحة الوفيات يا لينا‎

Lina: Masoud.
Masoud: Yes.
Lina: Don’t be upset with me if I tell you I couldn’t find the soup page in Abla Nazira’s book.
Lina: What do you mean you didn’t make it?
Lina: No. How could I? I don’t know how.
Masoud: Why, Lena? Why? If you heat up the chicken, the water turns into soup.
Lina: I’m sorry. Don’t be upset.
Masoud: How could I not be upset? I was looking forward to it.
Lina: Next chicken.
Masoud: What are you talking about? You’re dreaming. The soup is the most delicious thing. . . . Fine. Did you make AL-Molokhia with chicken ? I just eat it like that. Oh, what a beauty !
Masoud: Should I taste it first, or drink it?
Lina: Taste it first, then tell me what you think.
Masoud: It’s obviously going to be amazing ..The most beautiful wife in the world.. I wish my friends were here. . .not to eat, but just to watch.
Lina: Yeah
Masoud: Because the Molekhia. . .<pause>
Lina: Delicious?!
Masoud: How did you make it?
Lina: Just like Abla Nazira’s book said exactly word for word.
Masoud: Nazira’s book?!
Lina: Yes, it is.
Masoud: Is this the Molekhia page?
Lina: Yes It’s the Molekhia page
Masoud: Impossible. This is the obituary page. Lena!

Humorous Elements

Community-and-Institution Elements

Based on example (1) above, the humorous load embedded in community-and-institution humorous elements appears in Egyptian cultural references including AL- Molekhia and the Abla Nazira’s book. AL-Molokhia (Jute mallow) is a popular dish, mainly served in Egypt and the levant (Syria, Lebanon, Palestine, and Jordan) and other Arab regions. It is commonly cooked with chicken for flavor and served with white rice. The reference of Abla Nazira’s book also indicates a cultural specificity of Egypt. Apla Nazira was a well-known Egyptian chef, and wrote the first cooking encyclopedia in the Arab world in the 1940s. The book gained a wide popularity, particularly in Egypt and was used for teaching students The arts of cooking (Abdelwareth, 2017). Humor is evoked through Masoud’s mockery of how Egyptian housewives trust Abla Nazera’s book as biblical cooking reference, yet making dishes of bad taste by describing Al-Molokhia page as the obituary page. The humor technique used here is absurdity as it is weird to make a similarity between Al-Molokhia page in Abla Nazera’s book and the obituary page in the newspaper. The humorous load is maintained by means of imitation and transfer strategies

Paralinguistic Elements

In addition to humor related to community-and-institution elements, humor is also marked by prosodic cues that, in its turn, contribute to the underlying socio-pragmatic meaning. These prosodic cues may be used to mark a specific style of humor related to the context of the comedic show or the humorous style of the characters in that show. For instance, Masoud’s humorous performance is distinguished by a relatively high mean pitch. Moreover, the high standard deviation of the mean pitch in both the source Egyptian audio, and the English cloned audio indicates the expressiveness and the emotionality of the voices. The following pitch analysis indicates the close match of pitch variables between the source and target audios (Figure 2).

Figure 2.

The table indicates the close match of pitch variables between the source and the target audios in example (1).

To prove the validity of voice cloning in preserves the same tonality of the original audio in the English cloned audio. The following “screenshots” are taken from Praat showing the close match of the mean pitch of part of Masoud and Lina exchange (Figures 3 and 4).

Figure 3.

Pitch analysis of the audio taken from the source Audio. The mean pitch measures 269.8 Hz.

Figure 4.

Pitch analysis of the audio take from the target audio. The mean pitch measures 293.7 Hz.

Humor Techniques

Berger (2017) refers to the complexity that surrounds humor. Therefore, many techniques may be present at one time, and while some techniques may not be funny when used in isolation, they work when used in combination with others (Table 2).

Table 2.

Example (2) a dialogue between Masoud and Lina, The Married Couples, “Al-Mutazawjoun”.

Source excerpt	Target excerpt	Humor technique
Masoud: يعني ايه؟ ما عملتيش شربة؟‎ Lina: لا وانا هعمل منين ما عرفش الطريقة‎ Masoud: ايه يا لينا ايه يا لينا الفرخة تسخينها بالميه تبقى الشربة اخص عليكيLina: معلش ما تزعلش‎ Masoud: مزعلش اييه بس ده انا قاعد هموت‎ Lina: الفرخة جايه بقى انا هعملها‎ Masoud: ا لفرخة الجايه ايه يا شيخه انتي بتحلمي الفرخة الجايه‎ ‎	Lina: What do you mean you didn’t make it? Lina: No. How could I? I don’t know how. Masoud: Why, Lena? Why? If you heat up the chicken, the water turns into soup. Lina: I’m sorry. Don’t be upset. Masoud: How could I not be upset? I was looking forward to it. Lina: Next chicken. Masoud: What are you talking about? You’re dreaming.	Repartee: Most of the humorous load in the abovementioned exchange are marked by Masoud (the husband) quick witty repartee in the term of the exchange between him and his wife. The humor is reflected in Masoud’s irritated response “to his wife “Lina” when she tried to calm him down by promising him to make a soup next time he buys a chicken by saying “ What are you talking about? You’re dreaming” Masoud’s utterance also use illusion to indirectly refers to his bad financial situations.
Masoud: انت عملتيها ازاي‎ Lina: زي كتاب ابلة النظيرة تمام كلمة كلمة‎ Masoud: ده كتاب نظيرة؟‎ Lina: هو ده‎ Masoud: صفحة الملوخية؟‎ Lina: صفحة الملوخية‎ Masoud: مش ممكن دي صفحة الوفيات يا لينا‎‎	Masoud: How did you make it? Lina: Just like Abla Nazira’s book said exactly word for word. Masoud: Nazira’s book?!Lina: Yes, it is. Masoud: Is this Al-Molekhia page? Lina: Yes It’s Al- Molekhia page Masoud: Impossible. This is the obituary page! Lena	Another example on the use of repartee in this exchange is Morsi’s Final utterance “Impossible. This is the obituary page! Lena.” Morsi’s exaggerated Claim that the page of the Molekhia recipe is actually the obituary humorously implies that Al- Molekhia tastes bad.
Masoud: يلا مش الملوخية معمولة من الفرخة يا سيدي أدوء ولا شرب‎ Lina: وتقولي رأيك ايه‎ تدوء الأول‎Masoud:رأيي معروف رأيي معروف يا ست الكل يا احلى كان نفسي اعزم اصحابي مش يأكلوا ...‎. زوجة في الدنيا يتفرجوا بس‎	Masoud: Did you make AL-Molokhia with chicken ? I just eat it like that. Oh, what a beauty ! Masoud: Should I taste it first, or drink it? Lina: Taste it first, then tell me what you think. Masoud: It’s obviously going to be amazing. The most beautiful wife in the world.. I wish my friends were here. . .not to eat, but just to watch.	Conceptual surprise: The unexpected change of concept appears in the Masoud’s complement of his utterance. The audience may have anticipated Masouds’ finishing the utterance saying something like “I wish my friends were here I would have invited then over to eat. However, Masoud’s utterance came as “I wish my friends were here. . .not to eat, but just to watch” evoking humor. Self-deprecation: The humor lays in utilizing self-deprecation technique indicating Masoud low social background and bad financial situation that he wouldn’t be hospitable enough to invite his friends for a meal.

Example (2): The source audiovisual content of the Reel.(2)

Context: Masoud is thanking Hanafi, his best friend, for pretending that he is the groom instead of him (Table 3).

Table 3.

Example (3) a dialogue between Masoud and Hanafi, The Married Couples, “Al-Mutazawjoun”.

Source excerpt	Target excerpt
Masoud: حنفي انا من كل قلبي مش عار.. مالك؟‎ Hanafi: اصل لما حد يسلم علي على طول كده تلاقيني الشغال‎ Masoud: طب وريني الشمال? لا شغالة برضو. حقيقة مش عارف أرد لك?الجميل ده زاي حنفي ‎ Hanafi: عيب ما تقولش كده احنا اخوات‎ يا جدعMasoud: معلش بس انت قمت بدور العريس بطريقة‎ Hanafi: يا جدع ما تقولش كده عيب. وعهد الله لو طلبت مني أكمل لك الحكاية.‎ كلها انا تحت امركMasoud: .ما تهزرش معايا هزار جامد‎ Hanafi: الله انت مش بتهزر معايه ؟.‎ Hanafi: .انا بهزر معك بأيدي انما دي مش ايد ده رجلي طلع لها ايد‎	Masoud: Hanafi from the bottom of my heart. Don’t know. . . what’s wrong. Hanafi: When someone shakes my hand, I move like that. Masoud: how me the left hand. It works too. I really don’t know how to repay you Hanafi Hanafi: Don’t say that, man. We’re like brothers. Masoud: It’s okay, but you played the groom’s part brilliantly. Hanafi: Don’t mention it, man. If you want me to go all the way, I’m ready. Masoud: Don’t joke around with me like that. Hanafi: God, you also joked with me in the same way. Masoud: I did it with my hand, but this is not a hand, it’s a foot that grew a hand.

Source excerpt

Target excerpt

Masoud: حنفي انا من كل قلبي مش عار.. مالك؟‎
Hanafi: اصل لما حد يسلم علي على طول كده تلاقيني الشغال‎
Masoud: طب وريني الشمال? لا شغالة برضو. حقيقة مش عارف أرد لك?الجميل ده زاي حنفي ‎
Hanafi: عيب ما تقولش كده احنا اخوات‎
يا جدعMasoud:
معلش بس انت قمت بدور العريس بطريقة‎
Hanafi: يا جدع ما تقولش كده عيب. وعهد الله لو طلبت مني أكمل لك الحكاية.‎
كلها انا تحت امركMasoud: .ما تهزرش معايا هزار جامد‎
Hanafi: الله انت مش بتهزر معايه ؟.‎
Hanafi: .انا بهزر معك بأيدي انما دي مش ايد ده رجلي طلع لها ايد‎

Masoud: Hanafi from the bottom of my heart. Don’t know. . . what’s wrong.
Hanafi: When someone shakes
my hand, I move like that.
Masoud: how me the left hand. It works too.
I really don’t know how to repay
you Hanafi
Hanafi: Don’t say that, man. We’re like brothers.
Masoud: It’s okay, but you played the groom’s part brilliantly.
Hanafi: Don’t mention it, man. If you want me to go all the way, I’m ready.
Masoud: Don’t joke around with me like that.
Hanafi: God, you also joked with me in the same way.
Masoud: I did it with my hand, but this is not a hand, it’s a foot that grew a hand.

Humorous Elements

Community- Sense-of -Humor Elements

Humor is provoked by reflecting on the dynamics of the relationship between Masoud and his close friend Hanafi. It taps into cultural norms where friends do each other favors without waiting for praise in return “يا جدع عيب ما تقولش كدا احنا اخوات‎ “(Don’t say that, Man. We’re like brothers). However, in the case of Morsi and Hanafi, Masoud’s utterance انت قمت بدور العريس “معلش بس“... بطريقة (It’s okay, but you played the groom’s part brilliantly.), illustrates that the favor goes extremely as Hanafi pretends that he is the groom instead of Morsi, which is not exactly a common favor between friends. The humor technique used here is a pun as Masoud is playing with words to make fun. Humor is conveyed by means of direct transfer.

Non-marked Humorous Elements

Humor is also presented through unmarkable elements that relate to the context and the character’s personality. For instance, Masoud’s utterance “انا بهزر معك بأيدي انما دي مش ايد ده رجل طلع لها ايد‎” (I did it with my hand, but this is not a hand, it’s a foot that grew a hand) presents Masoud’s exaggerated and absurd statement about how Hanafi’s hand is strong intensifies the humor, allowing the scene to end on an absurd note. The humorous load is preserved through the use of literal or direct translation strategy to maintain the original text’s meaning.

Paralinguistic Elements

Humor is also prosodically marked by the slight rise of Masoud’s Relatively low pitch referring to Hanafi’s physical strength when he jokes around with him by batting Masoud on his shoulder win an exaggerated statement “انا بهزر معك بأيدي انما دي مش ايد ده رجلي طلع لها ايد.‎” (I did it with my hand, but this is not a hand, it’s a foot that grew a hand). The following screenshot from Praat analysis indicates that Masoud’s voice pitch slightly rises in his final statement in both the source Arabic audio and the cloned English dubbing (Figures 5 and 6).

Figure 5.

Masoud’s slight rise in pitch signaling his sarcastic absurd statement measures 224.1 Hz.

Figure 6.

Masoud’s slight rise in pitch signaling his sarcastic absurd statement measures 235.5 Hz.

Humor Techniques

Context: Hanafi teasingly says to Masoud “if you want me to be the real groom for the end, I am ready”, and Masoud feels angry and tells him not say that (Table 4).

Table 4.

Example (4) a dialogue between Masoud and Hanafi(2), The Married Couples, “Al-Mutazawjoun”.

Source excerpt	Target excerpt	Humor techniques
Masoud: .ما تهزرش معايا هزار جامد‎ Hanafi: .الله انت مش بتهزر معايه ؟‎ Hanafi: مش ايد ده رجلي طلع لهاانا بهزر معك بأيدي انما دي‎ ايد.‎	Masoud: Don’t joke around with me like that. Hanafi: God, you also joked with me in the same way. Masoud: I did it with my hand, but this is not a hand, it’s a foot that grew a hand	Absurdity: Masoud’s exaggerated description of Hanafi’s hand physical strength as a foot growing a hand is nonsensical, making it absurd and funny. The absurd exaggeration of a regular gesture adds to the comedic effect.

Madrasat Al- Moshaghebeen (School of Mischief)

Example (3): The source material of the Reel.(3)

Context: Bahgat and the rest of the students including Morsi, Mansour and Lotfi are planning to prank their new teacher, Effat. However, Ahmed, who comes from a poor humble social background unlike the other four students, refuses to join their plan because he’s afraid from getting expelled (Tables 5 and 6).

Table 5.

Example (3) quoted from madrasat al- moshaghebeen (School of Mischief).

Source excerpt	Target excerpt
Ahmed: بهجت‎ Bahgat: نعم؟‎ Ahmed: احنا صحاب مش كدا؟‎ Bahgat: أيوه‎ Ahmed: طيب انا مش هشترك معكم في الخطة دي‎ Bahgat: انت مش هشترك معاكم ليه؟‎ Ahmed: مش هشترك‎ Bahgat: ليه مش هشترك معاكم؟‎ Ahmed: .... لأن انا وضعي‎ Bahgat: أيوه‎ Ahmed: وضعي انا‎ Bahgat: أيوه‎ Ahmed: مختلف عنكم الاربعة‎ Bahgat: وضعك أيه؟‎ Ahmed: ما انت عارف وضعي مختلف عنكم انتو أربعة‎ Bahgat: مختلف!؟‎ Ahmed: !طبعًا‎ Bahgat: والنبي تشوف أبو وضع مختلف زي ايه وحياتك‎ ايه اشول يعني انت ولا أي؟?Morsi: تعالى يا أبو وضع. ايه الوضع؟‎ Ahmed: بقول وضعي مختلف عنكم انتو الأربعة‎ Morsi: أيوه يعني.. لما بتاكل بتشعر بإييه‎ Ahmed: هيه حكاية اكل! انا ازا اترفدت هترمي في شارع‎ Morsi: ولا يهمك هعيّنك عند ابويا‎ Ahmed: الموضوع مش زي ما انت فاهم‎ Morsi: الموضوع خرج من ايدي يا أستاز ما تعطلنيش!!! المصلحة مليانا موظفين! هوما فيش غيري بالمصلحة دي ولا ايه	Ahmed: Bahgat Bahgat : Yes, Yes? Ahmed: we’re friends, right? Bahgat: Yes. Ahmed: I won’t join this plan with you. Bahgat: Why won’t you join the plan with us? Ahmed: I won’t Join Bahgat: Why won’t you join us? Ahmed: Because my situation. . . . Bahgat: yes? Ahmed: My situation is different from you four. Bahgat: Your situation is what?! Ahmed: You Know my situation is different from the four of you. Bahgat: Different?! Ahmed: Sure. Bahgat: Does that mean you’re left-handed or what? Please deal with the one with the different situation.( speaking to Morsi, referring to Ahmed) Morsi: Come you with the situation. What’s the situation? Ahmed: My situation is different from yours. Morsi: The question is, how do you feel when you eat? Ahmed: Is this really about eating! If I get expelled, I’ll be on the street. Morsi: Don’t Worry. I’ll get you a job at Dad’s. Ahmed: It’s not like that. Morsi: It’s no longer up to me. Don’t hold me up. This place is full of civil servants. Is there no one else here or what?

Source excerpt

Target excerpt

Ahmed: بهجت‎
Bahgat: نعم؟‎
Ahmed: احنا صحاب مش كدا؟‎
Bahgat: أيوه‎
Ahmed: طيب انا مش هشترك معكم في الخطة دي‎
Bahgat: انت مش هشترك معاكم ليه؟‎
Ahmed: مش هشترك‎
Bahgat: ليه مش هشترك معاكم؟‎
Ahmed: .... لأن انا وضعي‎
Bahgat: أيوه‎
Ahmed: وضعي انا‎
Bahgat: أيوه‎
Ahmed: مختلف عنكم الاربعة‎
Bahgat: وضعك أيه؟‎
Ahmed: ما انت عارف وضعي مختلف عنكم انتو أربعة‎
Bahgat: مختلف!؟‎
Ahmed: !طبعًا‎
Bahgat: والنبي تشوف أبو وضع مختلف زي ايه وحياتك‎
ايه اشول يعني انت ولا أي؟?Morsi: تعالى يا أبو وضع. ايه الوضع؟‎
Ahmed: بقول وضعي مختلف عنكم انتو الأربعة‎
Morsi: أيوه يعني.. لما بتاكل بتشعر بإييه‎
Ahmed: هيه حكاية اكل! انا ازا اترفدت هترمي في شارع‎
Morsi: ولا يهمك هعيّنك عند ابويا‎
Ahmed: الموضوع مش زي ما انت فاهم‎
Morsi:
الموضوع خرج من ايدي يا أستاز ما تعطلنيش!!! المصلحة مليانا موظفين! هوما فيش غيري بالمصلحة دي ولا ايه

Ahmed: Bahgat
Bahgat : Yes, Yes?
Ahmed: we’re friends, right?
Bahgat: Yes.
Ahmed: I won’t join this plan with you.
Bahgat: Why won’t you join the plan with us?
Ahmed: I won’t Join
Bahgat: Why won’t you join us?
Ahmed: Because my situation. . . .
Bahgat: yes?
Ahmed: My situation is different from you four.
Bahgat: Your situation is what?!
Ahmed: You Know my situation is different from the four of you.
Bahgat: Different?!
Ahmed: Sure.
Bahgat: Does that mean you’re left-handed or what?
Please deal with the one with the different situation.( speaking to Morsi, referring to Ahmed)
Morsi: Come you with the situation. What’s the situation?
Ahmed: My situation is different from yours.
Morsi: The question is, how do you feel when you eat?
Ahmed: Is this really about eating! If I get expelled, I’ll be on the street.
Morsi: Don’t Worry. I’ll get you a job at Dad’s.
Ahmed: It’s not like that.
Morsi: It’s no longer up to me. Don’t hold me up. This place is full of civil servants. Is there no one else here or what?

Table 6.

Example (3) quoted from Madrasat Al- Moshaghebeen (School of Mischief).

Source excerpt	Target excerpt	Humor technique
Morsi: تعالى يا أبو وضع. ايه الوضع؟‎ Ahmed: بقول وضعي مختلف عنكم انتو الأربعة‎ Morsi: أيوه يعني.. لما بتاكل بتشعر بإييه‎	Morsi: Come you with the situation. What’s the situation? Ahmed: My situation is different from yours. Morsi: The question is, how do you feel when you eat?	Conceptual surprise + Absurdity. The audience at the theater may expect that Morsi would inquire about Ahmad’s refusal participating in the “prank” and attempt to convince him to participate. However, Morsi’s inquiry came out as irrelated question that doesn’t go in line with the seriousness of Ahmed situation. This indicates how absurd Morsi’s response contrasts with Ahmed’s serious speech.
Ahmed: بهجت‎ Bahgat: نعم؟‎ Ahmed: احنا صحاب مش كدا؟‎ Bahgat: أيوه‎ Ahmed: طيب انا مش هشترك معاكم في الخطة دي‎ Bahgat: انت مش هشترك معاكم ليه؟‎ Ahmed: مش هشترك‎ Bahgat: ليه مش هشترك معاكم؟‎	Ahmed: Bahgat Bahgat : Yes, Yes? Ahmed: we’re friends, right? Bahgat: Yes. Ahmed: I won’t join this plan with you. Bahgat: Why won’t you join the plan with us? Ahmed: I won’t Join Bahgat: Why won’t you join us	Parody: parody is used to mark humor in this scene where Bahgat imitates Ahmed serious statement with a sarcastic tone indicating that he doesn’t take Ahmed decision seriously. Bahgat’s choice of using “معاكم” instead of the more grammatically appropriate “معانا” is an example of linguistic parody. By intentionally misusing the pronoun, Bahgat is parodying Ahmed’s refusal to participate, turning the seriousness of the situation into something humorous. Therefore, the humorous effect of parody is lost in the English dubbing,
Ahmed: ما انت عارف وضعي مختلف عنكم انتو أربعة‎ Bahgat: مختلف!؟‎ Ahmed: !‎ طبعًاBahgat اشول يعني انت ولا إيه؟‎	Ahmed: You Know my situation is different from the four of you. Bahgat: Different?! Ahmed: Sure. Bahgat: Does that mean you’re left-handed or what?	Wit: Bahgat’s rhetorical question is classified as ingenious humor by shifting the serious tone of Ahmed’s statement into a humorous mocking on Bahgat’s side. Bahgat’s response refereeing to Ahmed’s different situation is Ahmed being left-handed.
Ahmed: بقول وضعي مختلف عنكم انتو الأربعة‎ Morsi: أيوه يعني.. لما بتاكل بتشعر بإييه‎ Ahmed: هيه حكاية اكل! انا ازا اترفدت هترمي في شارعMorsi: ولا يهمك هعيّنك عند ابويا‎ Ahmed: الموضوع مش زي ما انت فاهم‎ Morsi:الموضوع خرج من ايدي يا أستاز ما تعطلنيش!!! المصلحة مليانا موظفين! هوما فيش غيري‎ بالمصلحة دي ولا ايه‎	Ahmed: My situation is different from yours. Morsi: The question is, how do you feel when you eat? Ahmed: Is this really about eating! If I get expelled, I’ll be on the street. Morsi: Don’t Worry. I’ll get you a job at Dad’s. Ahmed: It’s not like that. Morsi: It’s no longer up to me. Don’t hold me up. This place is full of civil servants. Is there no one else here or what?	Conceptual surprise: Morsi’s exaggerated utterance contradicts with the audience perception of Morsi’s character as carefree, rebellious student who realistically doesn’t work. This unexpected contrast adds to the humor and playfulness of Morsi’s character. Absurdity: The absurdity arises from Morsi’s exaggerated reaction that is irrelevant to the context of serious situation intended by Ahmed.

Humorous Elements

Linguistic Elements

The humorous load is impeded in Bahgat’s erroneous use of a pronoun by imitating the 2nd person plural objective pronoun in معاكم‎ to mock Ahmed’s statement and indicating that Bahgat is not taking Ahmed’s desire (not joining the plan in pranking the new-comer teacher Effat Abd AL-Kareem) seriously. Mocking someone’s speech or part of it indicates disapproval and understatement of their behavior or attitude which evokes humor throughout the development of the play’s events. Using the 2nd person plural objective pronoun in معاكم‎ “ you” instead of the 1st person plural object pronounمعانا‎; in Bahgat’s question for Ahmed’s justification طيب انت مش‎” “هتشترك معاكم ليه؟. The humorous load that lies in the erroneous use of pronouns in the source language is lost in the target language. The 1st person plural objective pronoun “us” is used correctly rather than repeating the 2nd personal plural objective pronoun “you.”

Unmarked Elements of Humor

Humor appears in Morsi’s irrelevant question that shows his absurdity that clashes with the seriousness of Ahmed’s situation. The unmarked humor in Morsi’s question أيوه يعني.. لما بتاكل‎”“بتشعر بإييه is preserved and directly transferred into the English dubbing “The question is, how do you feel when you eat.”

Paralinguistic Elements of Humor

Preformed humor just as the case in comedic plays and stand-up comedy shows are mainly marked by speakers’ prosodic performance (Archakis et al., 2010). In conventional dubbing, voice talents are carefully chosen to dub comedy content based on how close their voices match the original voices. However, staged humor, such as stand-up comedy shows, and comedy plays are usually subtitled rather than dubbed because humor here is created by the way speakers say something rather than what they say. Figure 7 presents the close match of pitch analysis in both the original Arabic audio and the English cloned dubbing. The analysis reveals that voice cloning software in cross-lingual dubbing allows to closely clone the humorous tonality of the source audio, hence recreating humor marked prosodically in the target audio.

Figure 7.

The table indicates the close match of pitch variables between the source and the target audios in Example 3.

More detailed examples of humor marked by pitch analysis appear in Morsi’s performance. The first humorous instance appears in Morsi’s utterance when he asked Ahmed about his different “situation” with irrelevant silly question. The question was uttered with a relatively calm pitch to indicate Morsi’s absurdity as illustrated in Figures 8 and 9. The marked humorous pitch appears in the following screenshots are taken from Praat pitch analysis

Figure 8.

Pitch analysis of the audio take from the source audio. The mean pitch measures 193.8 Hz.

Figure 9.

Pitch analysis of the audio taken from the target audio. The mean pitch measures 209.5 Hz.

Humor Techniques

Example (4): the source audiovisual content of the Reel.(4)

Context: Abla Effat, the new teacher, is trying to engage her rebellious students in a lesson about “absolute values;” however, they keeps challenge her with absurd excuses (Table 7).

Table 7.

Example (4) quoted from Madrasat Al- Moshaghebeen (School of Mischief).

Source excerpt	Target excerpt
Effat: النهاردة حنتكلم عن القيم المطلقة‎ Morsi:؟ هنعمل ايه؟ ايه‎ Effat: .النهاردة يا اخ مرسي حنتكلم عن القيم المطلقة‎ Morsi: انا مش حتكلم معاكم‎ Bahgat: انا مش هقدر أتكلم معاكم بصراحه.‎ Effat: ليه‎ Bahgat:نعم‎ Effat: انا محبش أتكلم عن حد هو مش موجود‎ Morsi: ايه نتكلم ديه؟!‎ Effat: ده مش كلام‎ [Interrupted by Morsi] Morsi: ايه نتكلم ديه ! هوة التعليم باظ من شويه! جايين المدرسة عشان‎ نتكلم ولا عشان نتعلم	Effat: Today we’ll talk about absolute values. Morsi: What? What will we do today, <Morsi>? We’ll talk about absolute values. Morsi: I won’t talk with you. Bahgat: I can’t talk, frankly. Effat: Why? Bahgat: What? Effat: Why? Bahgat: I don’t like to talk about someone behind his back. Morsi: Why should we talk? Effat: It’s not like that. [interrupted by Morsi] Morsi: Why should we talk? That’s why education has failed. Do we come to school to talk or learn?

Source excerpt

Target excerpt

Effat: النهاردة حنتكلم عن القيم المطلقة‎
Morsi:؟ هنعمل ايه؟ ايه‎
Effat: .النهاردة يا اخ مرسي حنتكلم عن القيم المطلقة‎
Morsi: انا مش حتكلم معاكم‎
Bahgat: انا مش هقدر أتكلم معاكم بصراحه.‎
Effat: ليه‎
Bahgat:نعم‎
Effat: انا محبش أتكلم عن حد هو مش موجود‎
Morsi: ايه نتكلم ديه؟!‎
Effat: ده مش كلام‎ [Interrupted by Morsi]
Morsi: ايه نتكلم ديه ! هوة التعليم باظ من شويه! جايين المدرسة عشان‎ نتكلم ولا عشان نتعلم

Effat: Today we’ll talk about absolute values.
Morsi: What? What will we do today, <Morsi>? We’ll talk about absolute values.
Morsi: I won’t talk with you.
Bahgat: I can’t talk, frankly.
Effat: Why?
Bahgat: What?
Effat: Why?
Bahgat: I don’t like to talk about someone behind his back.
Morsi: Why should we talk?
Effat: It’s not like that. [interrupted by Morsi]
Morsi: Why should we talk? That’s why education has failed.
Do we come to school to talk or learn?

Humorous Elements

Linguistic Elements

The use of certain linguistic forms also adds to the overt humor elements. For instance, Moris’s fierce interrupting repetition of parts of Effat speech (what do you mean by let’s talk)“ ايه نتكلم دي! ايه نتكلم دية‎” indicates his frustration and disapproval to join the discussion in a childish way. Humor is also evoked by the semantic deviation of the use of the verb “نتكلم" (to talk (in the exchange from the intended meaning by Effat which is “to discuss.” While Effat uses the verb with intention to mean “to discuss” a topic such as “Absolute values,” Morsi and Baghat interpret it as a casual or inconsequential conversation. The contrast between what is intended (a serious discussion) and what is understood (a trivial conversation) produces comedic tension, making the exchange humorous.

Non-marked Humorous Elements

Non-marked humorous elements are encoded within the context of the exchange and its absurdity. For instance, Morsi’s statement“أنا محبش أتكلم عن حد هو مش موجود‎” (I don’t like to talk about someone behind his back) indicates that he has no clue about what the concept “Absolute Values” is. Moreover, Morsi exaggerated exclamation of the notion “talking” shift the focus from the serious settings of the discussion into an irrelevant absurd conversation.

Paralinguistic Humorous Elements

Tonality plays significant role in highlighting humor in the scene, precisely Morsi’s high pitch in expressing his frustration disapproval to Effat initiation of a topic. The following screenshots from Praat indicates how tone pitch marks and preserve humor in English by means of voice cloning (Figures 10 and 11).

Figure 10.

Morsi’s relatively high pitch indicating his exaggerated frustration measures 267.8 Hz.

Figure 11.

Morsi’s relatively high pitch indicating his exaggerated frustration measure 285.2 Hz.

Humor Techniques

Example (5): The source AV content of the Reel(5).

Context: Abla Effat is trying to engage her disobedient students in a lesson about “absolute values;” however, they keep challenge her with absurd excuses, saying in a sarcastic way, that “Absolute values” is a person and they do not like to talk about him in his absence (Table 8).

Table 8.

Example (4) quoted from Madrasat Al- moshaghebeen (School of Mischief).

Source excerpt	Target excerpt	Humor techniques
Effat: النهاردة حنتكلم عن القيم المطلقة‎ Morsi:؟ هنعمل ايه؟ ايه‎ Effat: .النهاردة يا اخ مرسي حنتكلم عن القيم المطلقة‎ Morsi: انا مش حتكلم معاكم‎ Bahgat: .انا مش هقدر أتكلم معاكم بصراحه‎ Effat: ليه‎ Bahgat:نعم‎ Effat: انا محبش أتكلم عن حد هو مش موجود‎ Morsi: ايه نتكلم ديه؟! Effat: ده مش كلام [Interrupted by Morsi] Morsi: ايه نتكلم ديه ! هوة التعليم باظ من شويه! جايين المدرسة عشان نتكلم ولا عشان نتعلم‎	Effat: Today we’ll talk about absolute values. Morsi: What? What will we do today, <Morsi>? We’ll talk about absolute values. Morsi: I won’t talk with you. Bahgat: I can’t talk, frankly. Effat: Why? Bahgat: What? Effat: Why? Bahgat: I don’t like to talk about someone behind his back. Morsi: Why should we talk? Effat: It’s not like that. [interrupted by Morsi] Morsi: Why should we talk? That’s why education has failed. Do we come to school to talk or learn?	Absurdity: the entire conversation is of an absurd nature. This appears in Morsi the absurd responses of both Morsi and Bhagat that clash with the formal discussion that Effat attempts to initiate.
Bahgat: انا مش هقدر أتكلم معاكم بصراحه.‎ Effat: ليه‎ Bahgat:نعم‎ Effat: ليه‎ Bahgat: انا محبش أتكلم عن حد هو مش موجود‎	Morsi: I won’t talk with you. Bahgat: I can’t talk, frankly. Effat: Why? Bahgat: What? Effat: Why? Bahgat: I don’t like to talk about	Misunderstanding: Bahgat’s reply “نعم” (What?) implies a delay in response meant to add humorous hesitation. Furthermore, Bahgat’s excuse for not joining the talk indicates his misunderstanding of the concept “Absolute values” Abla Effat has introduced by thinking of it as a person that Morsi’s ethics doesn’t allow him to talk about that person behind his back. The humor in this misunderstanding arises from the back-and-forth confusion between what is being asked and the unexpected responses, which elicit laughter through the absurdity of the interaction.
Effat: ده مش كلام‎ [Interrupted by Morsi] Morsi: ايه نتكلم ديه ! هوة التعليم باظ من شويه! جايين المدرسة عشان نتكلم ولا عشان نتعلم‎	Effat: It’s not like that. [interrupted by Morsi] Morsi: Why should we talk? That’s why education has failed. Do we come to school to talk or learn?	Parody: Morsi’s exaggerated outburst” “Do we come to school to chat or to learn?!” imitates the scolding utterance that usually teachers use to emphasize discipline and focus in the classroom. Morsi’s imitation goes extreme by indicating that talking in class is a driver for the failure of the education system “That’s why education has failed” humorously criticizing the rigidity of school educators and the public criticism about the education system in Egypt.

Context: The scene presents an exchange between Abla Effat and Morsi, one of the five rebellious students, where Effat asks Morsi a question about Logic, seemingly a topic that is out of his humble Knowledge (Tables 9 and 10).

Table 9.

Example (5): quoted from Madrasat Al-Moshaghebeen.

Source excerpt	Target excerpt
Effat: تعرف ايه عن المنطق؟‎ Morsi: ايوه‎ Effat: يلا تفضل‎ Morsi: يزيد فضلك تفضلي انتي‎ Effat: تعرف ايه عن المنطق يا مرسي؟‎ Morsi: ايوه يا أبلتي‎ Effat: تعرف ايه عن المنطق؟‎ Morsi: اهو ايوه‎ Effat: أتكلم‎ Morsi: اتكلم ازاي؟‎ Effat: تعرف ايه عن المنطق؟‎ Morsi:فين السؤال؟ فين السؤال؟ انا احط ايدي على السؤال تلايني فريرة. فين السؤال؟‎ Effat: هو ده‎ Morsi: اللي هو؟‎ Effat: تعرف ايه عن المنطق؟‎ Morsi: اعرف‎ Effat: تفضل‎ Morsi: الله يخليك‎ اعرف ان لما واحد يضرب واحد على دماغه يوقع ما يحطش منطق‎ هوة دا المنطق ولا مش هو! هوة ولا مش هوة يا متعلمة يا بتوعت المدارس! هو ولا مش هوة؟‎ Effat: هو بعينه‎ Morsi: انا عارف كل حاجة بس مدكن‎ Bahgat: هو ده المنطق؟‎ Effat: امال‎ Bahgat: هو ده؟‎ Effat: امال‎ Bahgat: الله انت بالذكر من ورانا ولا ايه؟‎	Effat: What do you know about logic? Morsi: Yeah Effat: Come on. Go ahead. Morsi: Thank you. You go ahead first. Effat: what do you know about Logic Morrissey? Morsi: Yes, teacher. Effat: What do you know about logic? Morsi: Yes, I do Effat: speak up. Morsi: How do I speak? Effat: What do you know about logic? Morsi: What is the question? What is it? If I know the question, I’ll answer right away. What is the question? Effat: This is it Morsi: which is?! Effat: what do you know about logic? Morsi: I know. Effat: Go ahead. Morsi: Thank you. I know that if one hits the other on the head, he falls down with no logic. Isn’t this the logic? Is it so or not? You educated one and teacher of schools. Is it so or not? Is it so or not? Effat: Yes, it is. It is exactly. Morsi: I know everything, but I keep it in my mind. Bahgat: is this logic? Effat: Of course. Bahgat: Is that it? Effat: Yes, definitely. Bahgat: Are you studying secretly without our knowledge or what?

Source excerpt

Target excerpt

Effat: تعرف ايه عن المنطق؟‎
Morsi: ايوه‎
Effat: يلا تفضل‎
Morsi: يزيد فضلك تفضلي انتي‎
Effat: تعرف ايه عن المنطق يا مرسي؟‎
Morsi: ايوه يا أبلتي‎
Effat: تعرف ايه عن المنطق؟‎
Morsi: اهو ايوه‎
Effat: أتكلم‎
Morsi: اتكلم ازاي؟‎
Effat: تعرف ايه عن المنطق؟‎
Morsi:فين السؤال؟ فين السؤال؟ انا احط ايدي على السؤال تلايني فريرة. فين السؤال؟‎
Effat: هو ده‎
Morsi: اللي هو؟‎
Effat: تعرف ايه عن المنطق؟‎
Morsi: اعرف‎
Effat: تفضل‎
Morsi: الله يخليك‎
اعرف ان لما واحد يضرب واحد على دماغه يوقع ما يحطش منطق‎
هوة دا المنطق ولا مش هو! هوة ولا مش هوة يا متعلمة يا بتوعت المدارس! هو ولا مش هوة؟‎
Effat: هو بعينه‎
Morsi: انا عارف كل حاجة بس مدكن‎
Bahgat: هو ده المنطق؟‎
Effat: امال‎
Bahgat: هو ده؟‎
Effat: امال‎
Bahgat: الله انت بالذكر من ورانا ولا ايه؟‎

Effat: What do you know about logic?
Morsi: Yeah
Effat: Come on. Go ahead.
Morsi: Thank you. You go ahead first.
Effat: what do you know about Logic Morrissey?
Morsi: Yes, teacher.
Effat: What do you know about logic?
Morsi: Yes, I do
Effat: speak up.
Morsi: How do I speak?
Effat: What do you know about logic?
Morsi: What is the question? What is it? If I know the question, I’ll answer right away. What is the question?
Effat: This is it
Morsi: which is?!
Effat: what do you know about logic?
Morsi: I know.
Effat: Go ahead.
Morsi: Thank you. I know that if one hits the other on the head, he falls down with no logic. Isn’t this the logic? Is it so or not? You educated one and teacher of schools. Is it so or not? Is it so or not?
Effat: Yes, it is. It is exactly.
Morsi: I know everything, but I keep it in my mind.
Bahgat: is this logic?
Effat: Of course.
Bahgat: Is that it?
Effat: Yes, definitely.
Bahgat: Are you studying secretly without our knowledge or what?

Table 10.

Example (5): quoted from Madrasat Al-Moshaghebeen.

Source Excerpt	Target Excerpt	Humor Technique
Morsi: اعرف ان لما واحد يضرب واحد على دماغه يوقع ما يحطش منطق هوة دا المنطق ولا مش هو! هوة ولا!مش هوة يا متعلمة يا بتوعت المدارسولا مش هوة؟‎ هو	Morsi: I know that if one hits the other on the head, he falls down with no logic. Isn’t this the logic? Is it so or not? You educated one and teacher of schools. Is it so or not? Is it so or not?	Absurdity: Absurdity appears in Morsi’s oversimplification of a complex topic such as Logic into violent act. Misunderstanding: Furthermore, His oversimplified answer indicates that he misunderstands the depth of The concept “logic.”
Morsi: اعرف ان لما واحد يضرب على دماغه يوقع ما يحطش منطق‎ واحدهوة دا المنطق ولا مش هو! هوة ولا !هوة يا متعلمة يا بتوعت المدارس ‎ مشهو ولا مش هوة؟	Morsi: I know that if one hits the other on the head, he falls down with no logic. Isn’t this the logic? Is it so or not? You educated one and teacher of schools. Is it so or not? Is it so or not?	Outwitting: Morsi humorously outsmarts Effat thinking that his absurd answer is reasonable. He goes own by mocking her and her education status by refuting the notion that those who receive Formal education are of higher intellectual abilities.
Bahgat: هو ده المنطق؟‎ Effat: امال‎ Bahgat: هو ده؟‎ Effat: امال‎ Bahgat: الله انت بالذاكر ايه؟‎ من ورانا ولا	Bahgat: is this logic? Effat: Of course. Bahgat: Is that it? Effat: Yes, definitely. Bahgat:	Misunderstanding: Bahgat reaction indicates his misunderstanding of “Logic.” Not realizing the absurdity of Morsi’s answer, Bahgat takes his answer seriously, especially after Effat’s confirmation, and humorously accuses Morsi of studying secretly without them. Humor is evoked through the clash between Bahgat’ serious tone of accusation and Morsi’s absurd answer.
Effat: تعرف ايه عن المنطق؟‎ Morsi: ايوه‎ Effat: يلا تفضل‎ Morsi: يزيد فضلك تفضلي انتي‎ Effat: تعرف ايه عن المنطق يا مرسي؟‎ Morsi: ايوه يا أبلتي‎ Effat: تعرف ايه عن المنطق؟‎ Morsi: اهو ايوه‎ Effat: أتكلم‎ Morsi: اتكلم ازاي؟‎	Effat: What do you know about logic? Morsi: Yeah Effat: Come on. Go ahead. Morsi: Thank you. You go ahead first. Effat: what do you know about Logic Morrissey? Morsi: Yes, teacher. Effat: What do you know about logic? Morsi: Yes, I do Effat: speak up. Morsi: How do I speak?	Repartee: the back-and- forth verbal Exchange between Effat and Morsi pokes humor through Morsi’s attempt to stall Effat instead of answering her directly. Humor is also evoked by Effat’s attempts to stay calm and maintain her patience which in the course of conversation starts to wear thin. However, Morsi’s playful and nonchalant attitude only serves to frustrate her further.

Humorous Elements

Linguistic Humorous Elements

Morsi’s witty use of language, precisely using stalling utterances “فين السؤال؟ فين السؤال؟‎”,“ايوه” (Ok, what is the question?) and (How can I speak?)“أتكلم ازااي‎” presents the way he manipulates the conversation to stall the answer to Effat’s direct question instead of explicitly saying that he doesn’t know the answer. Morsi’s absurd answers to Effat’s serious question create humorous tension that keeps the audience tuned for Morsi’s next utterance.

Community-and-Institutions Elements

Humor is also coded in certain social and cultural references that present particular social and cultural dimensions of the Egyptian community. For instance, Morsi’s sarcastic addition after his absurd answer about logic “يا متعلمه يا بتوعت المدارس” (You educated one and teacher of schools) pokes the fun through mocking the notion that formal education necessarily leads to better understanding. Morsi shifts the power dynamic, making himself seem wiser or more intellectual, despite of being ignorant.

Community- Sense-of-Humor Elements

Morsi’s absurd answer to Effat’s serious question draws insights on Morsi limited intellectual capability. Morsi oversimplified a complex topic such as Logic into a violent action “لما واحد يضرب واحد على دماغه يوقع ما يحطش منطق‎,” ( I know that if one hits the other on the head, he falls down with no logic), yet he acts as if he has just provided a perfect reasonable answer. His attitude presents a common tendency in Arabic society in general and Egyptian society in particular to oversimplify complex topics, which are out of their knowledge, into references from daily life. Humor is relatable to the audience since they are familiar of Morsi’ s absurd attitude.

Unmarked-Humorous Element

The unmarked humorous element is highlighted by Bahgat’s final exclaiming utterance reflecting on Morsi’s absurd answer “ الله انت بالذاكر من ورانا ولا ايه‎” (Are you studying secretly without our knowledge or what?). Not realizing the absurdity of Morsi’s answer, Bahgat turns back to Morsi and accuses him of betraying them by studying secretly. Humor is layered in the clash between what would normally be expected of a student to do and Bahgat scolding reaction as if studying is not a thing they should do, and Morsi has deviated this norm by studying secretly without their knowledge.

Paralinguistic Humorous Elements

Tonality significantly marks humor in this scene, particularly Morsi’s exaggerated verbal performance. The drastic shift in Morsi’s tonality from his low pitch in stalling utterances to a higher pitch in his exaggerated absurd response. The following screenshots from Praat indicates how Morsi’s high voice pitch marks and preserves humor in English by means of vocie cloning (Figures 12 and 13).

Figure 12.

Morsi’s relatively high pitch in his exaggerated response measures 337 Hz.

Figure 13.

Morsi’s relatively high pitch in his exaggerated response measures 305.4 Hz.

Humor Techniques

Discussion of the Findings

The first layer of analysis drew insights on the humorous elements found in the data classified based on Martínez-Sierra’s (2006) Taxonomy of Humorous Elements, and humor techniques used in the content based on Juckel et al. (2016) typology of humor techniques. The analysis marked a variety of humorous elements in the reels, stemming from the diversity of the original content. Precisely, the analysis identifies culture-related elements, such as community-and-institution elements and community-sense-of-humor elements. By identifying these elements and understanding how they contribute to the overall humorous effect, the translator was able to make translation choices that preserve humorous effect in the English-dubbed reel, especially elements that hold some cultural and social specificity, such as community-and-institution elements and community-sense-of-humor elements. Most significantly, the role of paralinguistic elements, particularly prosodic cues, such as voice pitch in delivering humor was also marked in all reels, with minor instances of language-based humor. The analysis indicates that ElevenLabs’ voice cloning technology effectively replicates these prosodic cues, with differences in voice pitch between the original and cloned voices ranging from 10 to 30 Hz, which is deemed insignificant. These differences are attributed to voice cloning’s control over aspects like pitch, volume, and intensity, making differences related to voice pitch between characters more pronounced. Moreover, lower quality of the original recordings, often from older films and plays also contribute to these minor differences.

The majority of the verbal content was directly transferred into English, which maintained the humor in some cases but led to losses in others. To handle cultural references, colloquial structures, and idiomatic expressions, the translator employed a mix of translation strategies, including paraphrasing, resignation (intentionally omitting a part of the source text because it hard to translate in the target language), expansion, omission, and condensation (shortening or simplifying the text), making the humor more relatable and natural to an English-speaking audience.

Regarding humor techniques, the contents involved a mix of techniques related to language (e.g., irony, repartee, wit), logic (e.g., absurdity, conceptual surprise, outwitting, misunderstanding), and identity (e.g., parody, rigidity, self-deprecation). As it was evident in the analysis, most of the used humor techniques are related to logic, particularly absurdity, conceptual surprise, and misunderstanding. These techniques were preserved into the English dubbing by means of direct transfer strategy.

Viewers’ Perception Toward Voice Cloning as a Dubbing Tool

This layer of analysis draws insights from viewers’ perceptions regarding voice cloning efficiency in dubbing Egyptian Comedy into English. The analysis compares the perceptions of two groups of the study’s population including Arabic-English bilinguals and English natives. The sample of population consists of 11 participants from each group. Both of the questionnaires tackle viewers’ perceptions toward the preservation of various aspects of humor, such as cultural adaptation, comedic timing, humor tonality and most importantly the validity of voice cloning in conveying humor compared with subtitling and conventional dubbing.

As it appears in Figure 14, most of Arabic-English bilinguals enjoyed the dubbed reels; however, find them less humorous than the original Arabic content. Their views varied regarding the naturalness of the cloned voices. While the slight majority found that voice cloning preserved key vocal qualities, some reflected dissatisfaction regarding its ability in capturing comedic timing and cultural humor. More than the half of the participant found humor rendered well into English, yet they struggled to understand some cultural references. They generally agreed on the fact that voice cloning is as a more effective dubbing tool in preserving voice authenticity compared with conventional dubbing. Nevertheless, opinions on its revolutionary potential were divided. On the other hand, English natives enjoyed and comprehend the dubbed reels, with being mostly satisfied with naturalness and the consistency of the cloned voices. Moreover, they deemed voice cloning effective in preserving emotional tone and personality alignment. However, views were neutral regarding comedic timing while cultural adaptation was not adequately reflected in the dubbed reels. Though most of them prefer subtitles, they acknowledged that voice cloning made the humor more accessible compared with subtitling and conventional dubbing. Briefly, both groups recognized voice cloning’s potential as a dubbing tool, though slight improvements in voice quality and naturalness were suggested, along with better renditions of cultural elements.

Figure 14.

Viewers’ perception toward voice cloning as a dubbing tool.

Conclusions

The study revealed that the English dubbing maintained the essence of humor by means of literal or direct translation strategy, particularly humor evoked by means of techniques related to logic and character identity. However, the English dubbing fell shorts to account for some of the cultural contextual aspects of a few instances of humor when transferred directly. It was evident in the data analysis that most of the cultural assets of humor were paraphrased and resigned to fit with target audience cultural background and social dynamics. Nevertheless, there were a slight majority of English natives who found some cultural assets of humor confusing. According to English natives, the dubbed contents were at some various levels of enjoyment. Bilinguals, on the other hand, mainly indicated that the dubbed contents are much less funny compared to the source Arabic ones. This variety of enjoyment is mainly affected by the extent to which the viewer is familiar with Arabic humor in general and Egyptian humor in particular.

Regarding the efficiency of voice cloning as a dubbing tool in dubbing Egyptian humor into English, the majority of both groups found voice cloning relatively efficient in capturing the emotions and tone of the original comedy. Therefore, voice cloning outperformed subtitling in terms of making humor more accessible and comprehensible to English speaking audience. Most significantly, the slight majority of both groups believe that voice cloning, to some extent, outperformed conventional dubbing in conveying Egyptian humor. These findings asserted that viewers assumption that voice cloning is likely to be considered as an effective tool for dubbing comedy content into English. The study recommends that experts in AVT industry should be more open to the integration of advanced AI technologies such as AI voice cloning to enhance the tonal and emotional fidelity in dubbed content, especially comedic content where the delivery of humor is highly dependent on preformed vocal nuances. Therefore, a hope for a full dubbing conducted in professional settings, produced by high quality voice cloning technology will be realized. Future research is recommended to account for the utility of voice cloning in dubbing other genres of audiovisual contents bedside comedy such as drama, documentaries, horror and cartoons etc.

Footnotes

ORCID iD

Ahmad Mohammad Al-Harahsheh

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author on request.

References

Abdelwareth

(2017). The story of “Abla Nazira,” author of the most famous cookbook. https://lite.almasryalyoum.com/extra/138975/

Al-Abbas

L. S.

Haider

A. S.

(2021). Using modern standard Arabic in subtitling Egyptian comedy movies for the deaf/ hard of hearing. Cogent Arts and Humanities, 8(1), 1–15. https://doi.org/10.1080/23311983.2021.1993597

Archakis

Giakoumelou

Papazachariou

Tsakona

(2010). The prosodic framing of humour in conversational narratives: Evidence from Greek data. Journal of Greek Linguistics, 10, 187–212.

Arik

Diamos

Gibiansky

Miller

Peng

Ping

Raiman

Zhou

(2017, May 24). Deep voice 2: Multi-speaker neural text-to-speech [Conference session]. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017). https://doi.org/10.48550/arXiv.1705.08947

Arik

S. Ö.

Chen

Peng

Ping

Zhou

(2018). Neural voice cloning with a few samples [Conference session]. Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018). https://doi.org/10.48550/arXiv.1802.06006

Attardo

Raskin

(1991). Script theory revis(it)ed: Joke similarity and joke representation model. Humor: International Journal of Humor Research, 4(3–4), 293–347. https://doi.org/10.1515/humr.1991.4.3-4.293

Balyan

Agrawal

Dev

(2013). Speech synthesis: A review. International Journal of Engineering Research and Technology, 2(6), 57–75.

Baños

Chaume

(2009). Prefabricated orality: a challenge in audiovisual translation. In Giorgio Marrano

Nadiani

Rundle

(Eds.), The translation of dialects in multimedia. Special issue of InTRAlinea (pp. 1–6). http://discovery.ucl.ac.uk/id/eprint/1425739

Berger

A. A.

(2017). An anatomy of humor. Routledge.

10.

Chaume

(2020). Dubbing. In Bogucki

Ł.

Deckert

(Eds.), The Palgrave handbook of audiovisual translation and media accessibility. Palgrave studies in translating and Interpreting (pp. 103–132). Palgrave Macmillan.

11.

Chen

Tan

Zhou

(2022). V2C: Visual voice cloning [Conference session]. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR52688.2022.02056

12.

Chiaro

(2024). Verbally expressed humor and translation. In Ford

Chłopicki

Kuipers

(Eds.), De Gruyter handbook of humor studies (pp. 569–608). De Gruyter.

13.

Díaz-Cintas

(2009). Audiovisual translation. In Angelone

Ehrensberger-Dow

Massey

(Eds.), The Bloomsbury Companion to Language Industry Studies (pp. 209–230). Bloomsbury Publishing Plc.

14.

Dore

(2019). Humour in audiovisual translation: Theories and applications. Routledge.

15.

Dore

(2020). Humour translation in the age of multimedia (1st ed.). Routledge.

16.

ElevenLabs. (2024). Overview. https://elevenlabs.io/docs/product/voices/voice-lab/overview

17.

Federico

Enyedi

Barra-Chicote

Giri

Isik

Krishnaswamy

Sawaf

(2020). From Speech-to-Speech translation to automatic dubbing. arXiv Preprint arXiv:2001.06785, 257–264. https://doi.org/10.18653/v1/2020.iwslt-1.31

18.

Gambier

(2008). Recent developments and challenges in audiovisual translation research. In Chiaro

Heiss

Bucaria

(Eds.), Between text and image: Updating research in screen translation (pp. 11–33). John Benjamins Publishing Company.

19.

Hatami

Chen

Kholidy

Blasch

Ardiles-Cruz

(2024). A survey of the real-time metaverse: challenges and opportunities. Future Internet, 16(10), 1–52. https://doi.org/10.3390/fi16100379

20.

Zhu

(2023). A real-time voice cloning system with multiple algorithms for speech quality improvement. PLoS ONE, 18(4), e0283440. https://doi.org/10.1371/journal.pone.0283440

21.

Jemine

(2019). Real-time voice cloning [Master Dissertation, Université Liège].

22.

Juckel

Bellman

Varan

(2016). A humor typology to identify humor styles used in sitcoms. Humor - International Journal of Humor Research, 29(4), 583–603.

23.

Klatt

D. H.

(1987). Review of text-to-speech conversion for English. Journal of the Acoustical Society of America, 82(3), 737–793. https://doi.org/10.1121/1.395275

24.

Xie

Zhang

Ren

Liu

Yao

Ren

F. R.

(2024). A review of human emotion synthesis based on generative technology. arXiv preprint arXiv:2412.07116.

25.

Maksymchuk

Horenko

Sushko

(2023). Teaching simultaneous translation in social networks to future interpreters. 17(85), 229–233.

26.

Martínez-Sierra

J. J.

(2006). Translating audiovisual humour. A case study. Perspectives, 13(4), 289–296. https://doi.org/10.1080/09076760608668999

27.

Misener

(2017). Say what? How a Canadian company can clone your voice. https://www.cbc.ca/news/science/lyrebird-clones-voices-1.4084423

28.

Napolitano

(2020). The cultural origins of voice cloning. In Proceedings of the Eighth Conference on xCoAx 2020: Computation, Communication, Aesthetics & X (pp. 59–73).

29.

Nassif

(2021). Codeswitching between Modern Standard and colloquial Arabic as L2 sociolinguistic competence. Applied Pragmatics, 3(1), 26–50. https://doi.org/10.1075/ap.19022.nas

30.

Nida

E. A.

(1964). Toward a science of translating. J. Brill Leiden.

31.

Nițu

V. L.

(2024). Enhancing player immersion: Automatic AI localisation of romanian dialogue in video games [BS thesis, University of Twente].

32.

Pérez

Díaz-Munío

G. G.

Giménez

Silvestre-Cerdà

J. A.

Sanchis

Civera

Jiménez

Turró

Juan

(2021). Towards cross-lingual voice cloning in higher education. Engineering Applications of Artificial Intelligence, 105, 1–9. https://doi.org/10.1016/j.engappai.2021.104413

33.

Project Revoice. (2019). A voice cloning initiative from the ALS association. Retrieved May, 2025, from https://www.projectrevoice.org/

34.

Raskin

(1985). Semantic mechanisms of humor. Proceedings of the Fifth Annual Meeting of the Berkeley Linguistics Society, (1979), 325–335. https://doi.org/10.3765/bls.v5i0.2164

35.

Sanyal

Asama

Godhalaa

Ghoshb

(2024). H2E: Transforming Hindi hits into English epics. https://ssrn.com/abstract=5000499 or http://dx.doi.org/10.2139/ssrn.5000499

36.

Shehata

S. S.

(1992). The politics of laughter: Nasser, Sadat, and Mubarek in Egyptian political jokes. Folklore, 103(1), 75–91. https://doi.org/10.1080/0015587x.1992.9715831

37.

Sperber

Wilson

(1986). Relevance: Communication and cognition. Blackwell.

38.

Thomas

(2024). AI and actors: ethical challenges, cultural narratives and industry pathways in synthetic media performance. Emerging Media, 2(3), 523–546. https://doi.org/10.1177/27523543241289108

39.

Vermeer

H. J.

(1989). Skopos and translation order. Heidelberg: Abt. Allg. Übersetzungs- u. Dolmetschwiss. d. Inst. für Übersetzen u. Dolmetschen d. Univ.

40.

Wang

Skerry-Ryan

R. J.

Stanton

Weiss

R. J.

Jaitly

Yang

Xiao

Chen

Bengio

Agiomyrgiannakis

Clark

Saurous

R. A.

(2017). Tacotron: Towards end-to-end speech synthesis [Conference session]. Proceedings of the Interspeech 2017. https://doi.org/10.21437/Interspeech.2017-1452

41.

Williams

(2024). Voice in the machine: AI voice cloning in film. Art Style, Art & Culture International Magazine, 13(13), 129–143.

42.

Zabalbeascoa

(2005). Humor and translation—An interdiscipline. Humor - International Journal of Humor Research, 18(2), 185–207. https://doi.org/10.1515/humr.2005.18.2.185

43.

Zabalbeascoa

(2020). The role of humour in AVT: AVHT. In Bogucki

Ł.

Deckert

(Eds.), The Palgrave handbook of audiovisual translation and media accessibility (pp. 667–686). Springer International Publishing.