Abstract
In China, deepfakes are commonly known as huanlian, which literally means “changing faces.” Huanlian content, including face-swapped images and video reenactments, has been circulating in China since at least 2018, at first through amateur users experimenting with machine learning models and then through the popularization of audiovisual synthesis technologies offered by digital platforms. Informed by a wealth of interdisciplinary research on media manipulation, this article aims at historicizing, contextualizing, and disaggregating huanlian in order to understand how synthetic media is domesticated in China. After briefly summarizing the global emergence of deepfakes and the local history of huanlian, I discuss three specific aspects of their development: the launch of the ZAO app in 2019 with its societal backlash and regulatory response; the commercialization of deepfakes across formal and informal markets; and the communities of practice emerging around audiovisual synthesis on platforms like Bilibili. Drawing on these three cases, the conclusion argues for the importance of situating specific applications of deep learning in their local contexts.
Keywords
This must be real
In early September 2020, a video depicting US President Donald Trump and Secretary of State Mike Pompeo singing the patriotic song “Wo Ai Ni Zhongguo” [我爱你中国 “I Love You, China”] started circulating on Chinese social media. The video was shared across WeChat groups and reposted on microblogging platform Sina Weibo (Yang Axing jiang yuwen, 2020), racking up thousands of visualizations and comments, most of which remarked on how funny and well-made it was: Haha, yes, the mouth shapes are exactly the same! Trump’s Chinese heart Too funny! The editing is great Very talented Fuck me, video can’t be Photoshopped, so this must be real
In the span of a few days, the clip was reuploaded on YouTube and made it to Twitter via a post by Hu Xijin, chief editor of state-owned tabloid newspaper Global Times, who attributed its authorship to unspecified Chinese internet users (Hu, 2020). The 1-minute video was widely interpreted as an expression of the popularity that the Trump administration had among Chinese audiences; for chuanfen [川粉 “Trump fans”], the US president’s counterproductive policies were both a boon to China’s rise on the global stage and a welcome counter to Western liberalism (Lin, 2021). As the 2020 presidential election neared, chuanfen expressed their wish for Trump to secure a second term in office: seeing him duetting an iconic patriotic anthem with his Secretary of State perfectly encapsulated this wave of popular support.
Obviously, the two protagonists of the video did not actually sing “Wo Ai Ni Zhongguo” at any point in time: the many comments praising its creator’s skill in editing and lip-syncing acknowledged that the clip was the result of a masterful media manipulation. Like many other pieces of audiovisual digital folklore circulating on Chinese social media, the Trump–Pompeo duet was originally posted on the video-sharing platform Bilibili (Figure 1). Uploaded in early September 2020 by creator UFO Shang de Shuchong [“Bookworm on a UFO”], the video quickly blew up in views, ranking 46th among the most watched videos on the entire platform—as of February 2021, it has been played more than three million times, with more than 7000 bullet comments
1
scrolling over it (UFO Shang de Shuchong, 2020). In the video description, its creator states that “The election is in a desperate situation,” inviting viewers to “Protect our Trump” by posting more bullet comments. Even if the video page clearly indicated that republishing the clip was prohibited without the author’s consent, many commenters warned UFO Shang de Shuchong about the video’s unauthorized circulation on other platforms and even news media channels. Thanking them for the notifications, he explained that he would keep reporting any media that republished his video without the Bilibili watermark because this sort “blood-sucking” behavior hurt the creators of original content. Bilibili video “Danger! Top-secret video of Trump and Pompeo showing their love for China leaked! I Love You, China” by UFO Shang de Shuchong, with hundreds of bullet comments scrolling in front. The most frequent comments include: hahahaha [哈哈哈哈 laughter], leimu [泪目 “crying”], wo cao [卧槽 “fuck”], and “???????”. Screenshot by the author.
The video created by UFO Shang de Shuchong is a deepfake. Using an open-source deep learning model, he generated an animation from still images of Donald Trump and Mike Pompeo based on the facial motions of himself singing “Wo Ai Ni Zhongguo,” and then added a popular version of the song performed by The Voice of China contestant Ping An as the audio track. The result, meant to entertain Bilibili audiences rather than to deceive viewers, is an example of how techniques of media manipulation and audiovisual synthesis are becoming increasingly available to amateur creators. From its debut on Bilibili to its circulation on social media platforms, the video belongs to the category of huanlian [换脸, literally “changing face”], the Chinese term for this sort of face-swapping, lip-syncing, and other synthetic media generated with deep learning models. While some of the appeal of this clip lies in its technical novelty, huanlian content has circulated on Chinese digital platforms since at least 2018 in the form of images, videos, animated GIFs, and interactive applications, following the developments of new machine learning models and frameworks. As a self-proclaimed non-professional creator 2 , Bilibili user UFO Shang de Shuchong generated this video building on the work of computer scientists, software developers and tech companies, and in turn contributed to the incorporation of huanlian into the broader repertoires of Chinese digital folklore.
Drawing on a literature review of synthetic media research, the analysis of industry reports, app walkthroughs, and digital ethnography, this article surveys the emergence, popularization, commercialization, and regulation of huanlian—as both vernacular content and creative practice—on Chinese digital media platforms. After briefly introducing deepfakes and global research about synthetic media, the first section frames huanlian in the context of Chinese internet history and vernacular creativity, arguing that novel techniques of audiovisual manipulation must be understood in light of existing repertoires of digital folklore and practices of digital media use. I then discuss three specific cases that have shaped the peculiar trajectory of huanlian in China. The first is the launch of the ZAO app in 2019, which popularized the creation of face-swapped content and precipitated societal concerns around deepfakes, triggering a substantial regulatory effort from Chinese authorities. The second case is the commercialization of deepfakes in both the formal market of mainstream media production and the informal market of underground pornography and other legal gray areas. The third case revolves around the communities of huanlian creators and audiences gathering on platforms like Bilibili to share knowledge about the technical aspects of audiovisual manipulation and showcase their creations. In the conclusion, I argue that contextualizing huanlian through these three case studies helps understanding how a specific application of deep learning is tied to broader developments in global machine vision research, national AI governance, and situated user practices.
Deepfakes and huanlian: A brief history
The origin of deepfakes is well established and closely tied to the sort of digital media practices that researchers like Jean Burgess have termed “vernacular creativity” (2006). The term “deepfake” itself comes from the nickname of the Reddit user who, in late 2017, shared their creation of a porn video in which the female actress’s face was replaced with that of Hollywood star Gal Gadot (Fikse, 2018). As documented by Samantha Cole’s deep dive on the r/deepfakes subreddit (2017), the technique developed by this user was primarily applied to create face-swapped pornography featuring female celebrities—an unsettling development in the history of fake celebrity porn, which had up to then been relegated to static photomontages. The Reddit user never revealed their identity, but suggested that their work was based on open-source deep learning libraries, using algorithms similar to image synthesis products developed by computer graphics multinational Nvidia. Image manipulation has a history as long as the one of photography, and academic research on automated video manipulation has been pursued by computer scientists since the late 1990s, but advancements in machine learning capabilities have made techniques like facial reenactment less computing-intensive and more scalable only in the mid-2010s (Thies et al., 2016). Despite the r/deepfakes subreddit being shut down in February 2018, the term deepfake has been widely adopted to describe “believable media generated by a deep neural network” (Mirsky and Lee, 2021: 3), and deepfakes have proliferated as more creators experimented with new machine learning models (Brock et al., 2019). At the same time, tech companies like Snap, Apple, and ByteDance have integrated deepfake functions in their platforms, offering their users playful image filters and animated emoticons (Gershgorn, 2020). From celebrity porn and humorous memes to advertisement and media art, by 2020 deepfakes have undeniably become a mainstream phenomenon (Hao and Heaven, 2020).
Researchers across disciplines have extensively discussed the philosophical, cultural, social, legal, and geopolitical implications of deepfakes. The scale at which this new kind of automated media manipulation can be deployed, along with the sophistication of the synthesized content, has led scholars to identify the emergence of “post-fact performance” (Fletcher, 2018) challenging established notions of authenticity (Floridi, 2018), eroding trust in the news (Vaccari and Chadwick, 2020), undermining the testimonial value of recorded media (Rini, 2020) and ultimately becoming an “epistemic threat” to societies (Fallis, 2020). A report by visual threat intelligence company Sensity correlates the rapid growth of deepfakes online—largely driven by pornography—with an increase in computer science research on the subject and the commodification of audiovisual synthesis tools (Ajder et al., 2019). Similarly, in their Data & Society report, Britt Paris and Joan Donovan identify the speed and scale at which deepfakes circulate on digital platforms as a major challenge to fact-checking and moderation (Paris and Donovan, 2019: 8), while also warning about the broader spectrum of audiovisual manipulation and the longer history of the politics of evidence (p. 17). Mainly writing from a US perspective, experts largely agree in identifying a lack of legal mechanisms to handle deepfakes (Harris, 2019), recommending multi-scalar solutions (regulatory, technical, social, and commercial) to minimize their threat to personal privacy, democratic processes, and even national security (Chesney and Citron, 2019). Policy recommendations include risks assessment frameworks (Kietzmann et al., 2020) and a variety of responses such as detection, data provenance, and legal accountability (Boneh et al., 2020) aimed at mitigating the diminishing operational costs of deepfake creation and the democratization of media manipulation tools (Hwang, 2020). Driven by ongoing advancements in computer science, the cat and mouse game between deepfake generation and detection (Mirsky and Lee, 2021: 27) continues to shape how government and platforms respond to synthetic media.
The history of huanlian in China runs in parallel with that of deepfakes and of media manipulation more generally. Practices of image editing have been part and parcel of vernacular creativity since the early years of internet access in the country, supporting the creation of repertoires of humor and parody shared online and commonly known as egao, or “mischievous spoofs” (Rea, 2013). One of the earliest examples of egao to reach worldwide popularity was that of Little Fatty, a Shanghai high-school student whose face—caught in a photograph with a funny expression—started being superimposed by internet users on countless images including movie posters, historical photos, and famous artworks in 2003 (Meng, 2011: 37). Not yet automated through deep learning and closer to the “cheap fake” end of the media manipulation spectrum (Paris and Donovan, 2019: 11), these photoshopped images nonetheless raised many of the problematic implications identified in deepfakes: Qian Zhijun, the real name of Little Fatty, faced harassment and bullying over his sudden popularity and was offended by the many montages that pasted his face on the body of porn stars (Cheung, 2009: 196). From the decade of amateur photoshopped images shared on online forums and blogs, cheap fakes proliferated on Chinese social media throughout the 2010s, and swapping the faces of celebrities, fictional characters, historical figures, and unwitting private citizens for fun has become a common practice for producing content ranging from humorous video-montages to animated stickers used in messaging apps (de Seta, 2016, 2018). The popularity of machine learning-powered huanlian content should be understood in light of this longer history of media manipulation on Chinese online platforms and not merely reduced to its underlying technological advances and commercial trends.
The automation of face manipulation arrived in China first through news about the emergence of Reddit deepfakes and then thanks to amateurs experimenting with open-source models and software like FakeApp and DeepFaceLab to produce huanlian videos of their own. Reports have noted that Chinese and South Korean deepfake videos started appearing alongside US ones on the major websites hosting this sort of content (Ajder et al., 2019: i). A growing number of research papers by Chinese computer scientists are presented at international machine vision and machine learning conferences, and several have proposed models to generate or detect deepfakes with more accuracy and reliability (Jiang et al., 2020; Li et al., 2020). These models are then offered as open-source tools, shared on code repositories, commercialized through APIs, or implemented by tech companies as innovative functions for their platforms and software products (Figure 2). The Artificial Intelligence White Paper released by Tencent in 2020 recognizes deep synthesis technology as a core domain of artificial intelligence R&D, not limited to the creation of huanlian content but also useful in professional audiovisual production, recommender systems and content verification (Tencent Research Institute, 2020: 26). Tencent’s own Youtu Lab, for example, applies deep synthesis to a wide range of products through its proprietary DittoGAN model, which is capable of face-swapping, face synthesis, and face merging (Tencent, 2020). Similarly, NetEase’s Fuxi Lab has experimented with the integration of face generation in MMORPG videogames, allowing players to reconstruct their face from a 2D photo into a 3D avatar (Ye, 2019). And Bytedance’s massively popular apps Douyin and TikTok have reportedly developed a face-manipulation function based on deep learning that has not yet been released to the public (Constine, 2020). Demo for the “Face Fusion API” offered by weather forecast provider Tianqi API at RMB ¥0.1 per call (US$0.015), here used by a WeChat mini-program to merge the user’s face with that of a female model and then embed it into a Lunar New Year digital greeting card. Screenshots by the author.
Not dissimilarly to other national contexts, the history of huanlian in China has followed the global flows of technological innovations, as new techniques of automated media synthesis are incorporated in existing local repertoires of vernacular creativity. The popularity of video reenactments like the Trump–Pompeo duet, image generation services such as personalized greeting card generators, and camera app filters for face-swapping selfies exist on the broader spectrum of image manipulation going from the cheap fakes typical of user-generated digital folklore to the automated generation of huanlian content. At the same time, the circulation of huanlian content, tools, and practices has been shaped by processes of democratization, commercialization, and regulation peculiar to the Chinese context. As argued by Paris and Donovan, 2019, contextualization and historicization are fundamental to understanding media manipulation. Most writing on deepfakes has focused on examples from the US and on the threats brought by deep synthesis to Euro-American democratic processes and privacy rights; this article seeks to expand research on automated media manipulation to a non-Western context and to situate huanlian in the longer history of Chinese digital media practices. Through the following three sections, I will examine different events and phenomena that have shaped the regulation, commercialization, and circulation of huanlian in China.
ZAO: Novelty, outrage, and regulation
Easily accessible software tools and user-friendly apps such as DeepFaceLab and FaceApp have been a key to the popularization of deepfakes around the world. In China, this role was played by ZAO, an app released by social media company Momo 3 on multiple app stores on the 30 August 2019, which both propelled huanlian into nation-wide popularity and precipitated a heated debate around them. ZAO’s value proposition was simple: users could download the app for free, register an account, upload a selfie, and obtain still images and short videos in which their face was swapped into celebrity photos or clips from movies and TV series—all branded with ZAO’s logo and easily shareable on other social media platforms. By August 31, ZAO had exploded in popularity: even living outside of China, I witnessed the feeds of local social media apps fill with short clips of my contacts swapping their faces into scenes from Hollywood blockbusters and iconic moments from Chinese TV series. With millions of posts sharing user creations, the hashtag #ZAO AI huanlian# shot up the topic rankings on Sina Weibo; the app topped the charts of multiple stores, and its servers crashed repeatedly due to the massive overnight demand (Xu, 2019). Built upon Google’s deep learning technology, ZAO was one among several innovative products incubated by the Momo company (Jie, 2019), which was reportedly facing a dwindling userbase; with its successful launch, Momo attracted a large inflow of users by dramatically lowering the technical threshold of media synthesis and pioneering the commercialization of huanlian in China (Xia, 2019).
ZAO’s success was not surprising: when I installed the app a few days after its launch, I was able to generate amusing huanlian images and videos in a matter of seconds, joining the bustling discussions on social media while also sharing my creations with friends. All I had to do was set up and verify an account with my phone number, then choose my preferred option on the bottom menu ribbon: I could browse “Suggestions” of popular videos, “Create a biaoqing [表情 “expression,” animated GIF sticker],” check on my “Friends,” or export “My Creations.” The core activity of the app revolved around the audiovisual scenes populating the “Suggestions” tab, which is in turn divided in various sub-sections, including “Recommended,” “Followed,” “TV series,” “ZAO Special Selection,” and “Beautiful Faces.” ZAO’s huanlian functions required that I uploaded at a least one selfie—automatically rated for quality and pose—and that I verified it by completing some simple actions in front of the camera, like opening my mouth and nodding (Figure 3). After I did this, an unassuming notification informed me about the app’s terms of consent. Unexpectedly, it was precisely this bit of text that raised the suspicion of several users: just hours after ZAO’s launch, the buzz around the app quickly shifted from playful engagement to heated debate around its handling of private user information. Only one day later, many of the major Chinese newspapers run opinion pieces about the app’s troubling terms of consent, which appeared to grant ZAO and its parent company the rights to store and reutilize user photos indefinitely (Global Times, 2019) or even sell them to third parties (Xie, 2019), exposing users to well-documented societal risks of artificial intelligence products (Zhang, 2019b). ZAO’s huanlian creation flow: choice of a video clip or image (left), upload and verification of a user selfie (center), final product ready for sharing (right). Screenshots by the author.
Combined with the app’s collection of phone numbers and facial features for identity verification, ZAO’s terms of consent worried users about multiple things: the loss of image rights, the potential for harassment and harm, and the malicious use of biometric data. Suddenly, huanlian were about much more than playful content creation. As a precaution, WeChat temporarily banned the sharing of ZAO content, and Alipay reassured its users that face-swapping could not be used to trick payment systems based on face recognition (Coleman, 2019). By the fourth day after its launch, ZAO buckled under public scrutiny and revised its terms of consent, specifying that user photos would not be used for other purposes, and that users would be able to delete their personal information from the app’s servers (Xia, 2019). At the time of writing, the traces of this historic backlash are still very visible across the app’s interface. New users are welcomed by a short video loop introducing ZAO’s main selling points: self-fashioning (“perform in the best of worlds”), convenience (“only needs one photo”), and novelty (“new generation AI technology”), which is immediately followed by a reassuring notice on data privacy. At the selfie verification stage, ZAO reminds the user that the process “does not store identifying information,” offering a link to a “privacy explanation,” which reads: The “huanlian” effect provided by “ZAO” needs you to provide a photo of your face that meets certain requirements; through our post-processing technology we superimpose it on another one, creating a fictional image. To the naked eye, this fictional image might look like you, but it does not contain your real information. To protect image users from fraudulent use of their image rights, “ZAO” has set up a real-person verification stage: this process is only used to verify that you are the same person depicted in the photos you upload. To ensure your information security, “ZAO” will not store personal information on your facial features. The image rights of your photos belong to you.
Besides this short explanation, reminders about ZAO’s privacy-protection efforts are embedded in many of its functions. In case of failure to verify one’s profile picture, for example, the app’s social media sharing functions are disabled, and all the huanlian content generated by the user is stamped with a “Safe Mode” watermark to limit its unauthorized circulation.
The societal backlash against ZAO’s mishandling of user data did not only impact the app’s design and user agreements but also had a broader effect on China’s national regulation of digital platforms. News articles reporting on the ongoing social media discussions around the app became a long tail of legal commentary and policy recommendations which were promptly taken under consideration by Chinese authorities (Mo, 2019). The website of the Cyberspace Administration of China (CAC) records a clear timeline of the official response: from as early as September 12 (2 weeks after ZAO’s launch), the CAC website republished seven articles discussing huanlian, including a call for the development of more detailed AI regulation (Zhang, 2019a); a plea for the inclusion of “humanistic and safety genes” in AI development to ensure it remains person-centered (Fu, 2019); several infographics illustrating the risks of huanlian (Liu and Ai, 2019); and an overview of recent regulations on apps collecting personal data (Zhai et al., 2019). As explained by Liao Canliang, director of the People’s Daily Public Opinion Data Center, the Ministry of Industry and Information Technology (MIIT) had launched an inquiry into Momo as early as September 3rd, demanding that the company abided by existing laws and implemented stricter data security mechanisms (2019). On November 18, 2019, the State Internet Information Office (SIIO) and other ministries announced the “Regulations on the Administration of Networked Audiovisual Information Services” to be enforced from January 1, 2020 (Huang, 2019). These regulations included several articles clearly addressing “new technologies and new applications such as deep learning and virtual reality” (Cyberspace Administration of China et al., 2019), stipulating how platforms and providers must ensure data security, flag synthetic content, and avoid publishing false information (Article 10, 11, and 12). In less than 3 months, huanlian morphed from playful novelty into an insidious threat to be regulated.
Face-swap black markets and 50-cent special effects
In China, ZAO was undeniably the key actor that brought huanlian generation to the public’s fingertips, profoundly impacting the societal perception of synthetic media and triggering a substantial regulatory response. But even before the launch of Momo’s app, deepfakes have been undergoing a process of commercialization through multiple avenues. While tech companies like Tencent or Momo have incorporated huanlian into their platforms as a technological novelty to attract more users into their ecosystems, other media industries have also sought to profit from the novel accessibility of techniques of audiovisual synthesis. As widely documented by communication scholars, the commercialization of new media formats and products in China tends to split in two parallel markets: an “official” one of legacy media and state-driven development and an “unofficial” one of underground experimentation and unregulated circulation (Li, 2019). Elaine Jing Zhao has termed these the “formal” and “informal” circuits of China’s digital media economy (Zhao, 2019), the former being dominated by state media and commercial conglomerates, and the latter being exemplified by “parallel trade, unauthorized file sharing, free and open-source software development, amateur cultural production and on-demand labour” (p. 1), with digital platforms straddling the line between the two and extracting value from under-regulated gray areas. This section discusses the formal and informal commercialization of huanlian, broadening the discussion beyond digital platform companies and arguing that the ongoing interplay between formality and informality is indispensable in shaping how new technologies like synthetic media generation are domesticated.
Since the earliest examples of deepfakes shared on Reddit, the creation of pornographic content has been the driving force behind the popularization of audiovisual synthesis, and pornography constitutes the vast majority of deepfakes circulating online (Ajder et al., 2019: p. 1). The same has been the case in China, where news about the global circulation of deepfake porn have portrayed huanlian as chiefly used for this purpose. Given the strict regulation of pornography in China (Jacobs, 2012), deepfake pornography circulates for the most part via peer-to-peer channels and third-party intermediaries rather than through dedicated websites: since 2019, I have seen deepfake porn videos of Japanese idols being occasionally discussed among my WeChat contacts as something “more real than the real” responding to “a desire to capture illusions.” Around the same time, a widely read inquiry by the Beijing News discovered numerous sellers on platforms like Baidu Tieba and Xianyu offering to create personalized huanlian content by swapping the face of celebrities or private citizens on adult movies for very affordable prices (Chen, 2019). This gray area extends beyond pornography: other reports have identified e-commerce vendors selling animations generated from still photographs tailored to trick the facial verification systems of different apps (Shen, 2019). As predicted by Robert Chesney and Danielle Citron, the proliferation of media synthesis tools accessible to the general public reproduces historical patterns of black market formation (2019: 1763); Chinese observers have similarly noticed how the underground huanlian market has become a “black industrial chain” not limited to the hosting of finished audiovisual products but also extending to the provision of software and tutorials 4 , allowing anyone to create face-swapped content with minimal requirements of personal data and little to none ethical oversight (Xu, 2019).
While the black market of informal huanlian has thrived thanks to lax regulations and popular demand, the formal commercialization of synthetic media has had to compete with the relatively high bar set by tech companies and amateur creators. A revealing example is the fantasy TV series “Love of a Thousand Years” based on Shi Silang’s web-novel “The Killing of Three Thousand Crows.” As its 30 episodes aired on the Mango TV station from March 2020, viewers quickly realized that something had happened to the character of Qingqing, played by actress Zhang Dingding—her face seemed to be pasted on, and her expressions and facial features seemed to track inconsistently with her head movements (Yuli, 2020). It was quickly confirmed that the character was originally played by actress Liu Lu, who had in the meantime been put in administrative detention for her refusal to undergo a security check at a train station; in order not to affect the broadcast, the production was forced to recast actress Zhang Dingding in the same role, reshoot some of her scenes, and use huanlian techniques to swap the two actresses’ faces in the rest of the footage, preserving the character’s continuity (Figure 4). The audience’s verdict was unanimous: the face-swapping effect was not up to the technical standards of other media products, it looked cheap and unfinished, and offended the viewers’ intelligence (Siyuetian, 2020). The Zhang Dingding huanlian became the epitome of a broader category of “fifty-cent special effects,” or cheap visual effects used by TV productions to save money and avoid the more expensive face-swapping technologies pioneered by cinema or video game studios (Li and Xu, 2020). A scene from “Love of a Thousand Years” featuring the character Qingqing played by a huanlian of actress Zhang Dingding. Screenshot by the author.
With less than 10 min of screentime, Qingqing was a minor character in “Love of a Thousand Years,” and her recasting would have probably gone unnoticed if the poorly made huanlian had not caught the audience’s attention. Instead, the TV production’s choice to implement a novel (and more affordable) technological solution foregrounded the slippage between formal and informal huanlian industries; as many spectators noted, the face-swap was of poorer quality than many amateur huanlian created by fans and uploaded on social media platforms (Huanyu yingshi fenxiang, 2021). If believable deepfake porn videos could be bought online for a few hundred RMB, how could a TV product be this cheap? Several comments also pointed toward a similar case happened one year before, when an amateur creator nicknamed Huanlian Ge [“Face-swap bro”] had uploaded several huanlian videos on Bilibili: among them, one replaced the face of Hong Kong movie star Athena Chu with the face of Mainland Chinese actress Yang Mi in a classic scene of 1994 TV series “The Legend of the Condor Heroes” (People’s Daily, 2019). Huanlian Ge’s creations garnered both admiration and concern, especially from fans of the swapped-in actress who feared this technology would be used to infringe upon image rights and harm women (Liang, 2019). Now head of a professional huanlian studio in Jiangsu province, Huanlian Ge explained to me how he created the contentious video to showcase the technology’s potential: Bilibili has already taken some of my videos down because they were too realistic and could mislead audiences […] when the quality is too high, these videos can seem real, and thus go beyond the category of entertainment. (Interview with the author, March 2021)
As these examples show, the commercialization of huanlian beyond digital platforms is shaped by the tension between formal and informal markets; synthetic media are a novelty, and their legitimacy and acceptability are often debated comparatively in terms of privacy risks, ethical implications, and economic profit.
Bilibili’s communities of creative practice
Many of the huanlian videos described or referenced in the previous sections—from the Trump–Pompeo patriotic duet to the Yang Mi/Athena Chu face-swap—have been originally uploaded on the Chinese video-sharing platform Bilibili. Inspired by Japanese website NicoNicoDouga, Bilibili was launched in 2010 by a Chinese web developer and quickly became a popular platform among fans of ACG (anime, comics, and games) content. Alongside anime, TV series, cinema, music, gaming, food and lifestyle videos, Bilibili also hosts flourishing communities of creators producing humorous content including video mashups, remixes, and overdubs. Visitors can watch most Bilibili videos without the need to log in, but commenting and uploading content requires a membership, which is granted after successfully completing a 100-question test on ACG-related knowledge. After several attempts, I finally passed the test with a barely sufficient score of 65/100 points, and became a Bilibili member in July 2020. Most of my activity on Bilibili consisted of exploring categories and search results, watching videos, and reading bullet comments, but I also uploaded some of my own attempts at creating huanlian, commented other creators’ uploads, and chatted with some of them via private messages in order to understand more about their creative practices. In contrast to apps like ZAO and commercial services like special effects studios or black market vendors, which usually hide the creative processes behind their final products, huanlian creators on Bilibili share details about their experiments, praising each other’s creations and offering technical suggestions. As noted by researchers, amateur creators of deepfakes often come together in communities of practice developing around content-sharing platforms and open-source repositories (Ajder et al., 2019: 4; Paris and Donovan, 2019: 33).
At the time of writing, a query for huanlian videos on Bilibili results in over a thousand results. These include scenes from TV series with swapped actors faces, celebrity faces swapped onto the creators themselves, and anime characters animated through face puppeteering. What appears at a first glance as a hodgepodge of nonsensical content can be navigated by following recurring themes or audiovisual elements like protagonists, scenes or soundtracks: over repeated views, Bilibili huanlian videos reveal their subdivision in various genres and intersections with local and global repertoires of digital folklore. One example is the “Dame da ne” category, which includes hundreds of entries. Its name refers to a line in “Baka Mitai” [ばかみたい “I’ve Been a Fool”], a song featured in the Japanese video game series Yakuza, which has been extensively parodied online since 2018. Footage from a YouTube user lip-syncing the song started being used as a driving video for deepfakes around July 2020, and the face-swapped results featuring various internet celebrities covering the song have become one of the first global deepfake memes. Bilibili creators have enthusiastically joined the fray, creating their own versions of “Dame da ne” huanlian featuring a variety of local favorites: Japanese anime characters, Chinese actors, tech entrepreneurs like Tencent’s Pony Ma, controversial figures like Adolf Hitler and Donald Trump, livestreaming celebrities, and even social media reaction images (Figure 5). This application of deepfake techniques is far removed from the most commonly discussed ones of pornography and deception, and demonstrates how media synthesis technologies are used creatively to mediate between global content flows (in this case, Japanese videogames, and YouTube memes) and local fandoms and genres. A choir of “panda head” biaoqing [表情 “expressions,” reaction images] singing the “Dame da ne” song, a huanlian video mashup created by Bilibili user Shenqi de Miaoer Kong (2020). Screenshot by the author.
The huanlian videos that circulate on Bilibili are made in a variety of ways, depending on the creator’s skill and the desired type of media manipulation: some are directly uploaded from apps like ZAO or software like FakeApp, others are created through open-source repositories and web-based code notebooks. The “Dame da ne” variety of huanlian offers interesting insights into the material life of audiovisual synthesis algorithms, since its origin was directly connected to the publication of a conference paper describing a “First Order Motion Model for Image Animation” (Siarohin et al., 2020). Proposed by researchers from the University of Trento, SNAP Inc. and Huawei, this model made it possible to generate videos “combining the appearance extracted from a source image with motion patterns derived from a driving video” (ibid., 1). While the authors envisioned their model as a useful innovation for video editing, photography, and e-commerce applications, the functioning demo they shared on the Google Colab platform quickly became a veritable meme generator: by simply uploading the original YouTube “Dame da ne” lip-sync as driving video, anyone could animate a static image of their choice and contribute to the popularity of this deepfake trend. The same happened in China: Bilibili users first republished existing “Dame da ne” videos found on other platforms, then learned about the Google Colab demo and started creating their own “varieties of videos with a refreshing style” (Xiaoxi, 2020). In the span of a few months, a novel model for video synthesis trickled down from a computer science conference to a Chinese digital media platform.
As a community of practice, huanlian Bilibili creators share not only their creations and comments but also content like textual walkthroughs and video tutorials aimed at introducing fellow users to specific audiovisual synthesis tools—especially when some of these might be hosted on platforms inaccessible from China such as Google Colab. Following the nation-wide backlash against deepfakes described in the previous sections, Bilibili has also occasionally removed huanlian-related content from its servers, forcing users to keep updating their recommendations (Xiaoxi, 2020). One example still available at the time of writing is a series of three video tutorials explaining the creation of “Dame da ne” huanlian in detail, uploaded by user HsLotus and viewed hundreds of thousands of times: I teach you how to easily make a “dame da ne” video, whether you can jump over the firewall or not! This song is very popular now, especially when paired with AI-based huanlian […]. In this video, I will teach you two ways of obtaining this effect. After you have learned, you can go spoof your friends or you can retcon your favorite stars, but first let me give you a warning: these two methods are quite troublesome, and you can encounter some weird bugs. I have also researched them for long time before I could get good results. (HsLotus, 2020)
The tutorials proceed to introduce the Google Colab demo and a locally hosted alternative for users without a VPN, as well as detailing the steps to prepare images and video for processing. More technically skilled creators code their own generators from the ground up: as one of them tells me, “I did try the Google Colab tutorial, but it did not work. I used a similar service available in China and wrote my own script in Python, so I did not need to cross the firewall” (Interview with the author, December 2020). Through their sharing, coding, commenting, and translating, Bilibili huanlian creators constitute a community of practice from which new genres of vernacular creativity emerge and circulate before occasionally making their way to other media.
Everyone loves huanlian, everyone is afraid of huanlian
In a short essay published on the WeChat account Hedgehog Commune, industry researcher Yuanzhang argues that huanlian have undergone a process of baicaihua [白菜化 “cabbagification”], meaning that from a specialized technical knowledge, media synthesis has quickly become accessible and mundane (Yuanzhang, 2019). The “cabbagification” of huanlian in China runs largely in parallel with global trends: from the appearance of deepfakes on Reddit, through amateur experimentation, societal panics, and platform responses, to commercial developments and regulatory proposals, the latest research innovations in audiovisual synthesis are eventually absorbed into formal and informal markets and constrained by legal measures. The history of deepfakes is also a glaring example of the speed with which novel algorithmic models and techniques are domesticated: in 2017, University of Washington researchers presented a groundbreaking synthetic rendition of former US president Barack Obama (Suwajanakorn et al., 2017); in 2020, Bilibili users adopted similar techniques to animate Donald Trump and Mike Pompeo singing a Chinese patriotic song. In the few years between the emergence of deepfakes and the proliferation of synthetic media on online platforms, much has happened in terms of the social, economic, cultural, and legal implications of algorithmic images. After briefly introducing the history of deepfakes and huanlian, this article has chronicled three facets of their “cabbagification” in China: the app-driven popularization and regulatory response triggered by ZAO; the commercialization of face synthesis through formal and informal markets; and the communities of practice emerging on platforms like Bilibili.
The main conceit of this article was to understand how huanlian are interpreted, created, and circulated by Chinese users, commercial actors and governmental institutions. Britt Paris and Joan Donovan have argued that historicization and contextualization are fundamental in understanding media manipulation (Paris and Donovan, 2019: 45); similarly, Dan Boneh and co-authors have called for “disaggregating” synthetic content into “as many smaller pieces as possible” (2019: 4). Through my historicization, contextualization and disaggregation of huanlian, I can offer some conclusions. First, while closely paralleling the global trajectory of deepfakes, huanlian have been shaped by the peculiar features of China’s tech industry: Chinese researchers are increasingly contributing to international machine learning and computer vision conferences, and local tech companies like Tencent or Momo have been instrumental in popularizing synthetic media. Second, discursive framing and societal debates determine how synthetic media are domesticated: perhaps fittingly to the Chinese term’s emphasis on “changing faces” rather than the English word’s reference to truth and forgery, public debates and regulatory responses in China have focused on practical flashpoints like fraud risks, image rights, economic profit, and ethical imbalances. Third, synthetic media is not simply about swapping faces with friends or animating celebrity portraits: huanlian threatened to upend trust in biometric systems like face recognition and content verification, proving that machine vision technologies are intricately co-dependent. The future of synthetic media, in China as in elsewhere, will likely see new techniques, practices, and genres of content emerge from computational arms races, challenging existing societal systems and cultural mores; historicizing, contextualizing, and disaggregating their trajectories remains a crucial task for research.
Footnotes
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported by the Machine Vision in Everyday Life project, which has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 771800).
