Abstract
Video-based communication is increasingly common online. This article looks at the hand signs that are used in lip-syncing videos on the app musical.ly and argues that they constitute a codified, non-verbal language of pictograms that is equivalent to emoji in text-based communication. Seeing the lip-syncing videos as performances allows us to situate the hand signs as the latest development in a long history of formalized gestures for theatrical performance. By recognizing this emergent gestural language as a performative element of online communication, we can reinterpret emoji and other forms of visual supplement to writing as part of a larger emerging system of communication where visual signs are becoming integrated into verbal language, not only in written language but also in video-based communication.
Gestures are an important, non-verbal element in face-to-face communication, and as online communication is becoming more video-based, gestures are becoming common online. However, gestures are being used in a more codified manner online than in face-to-face communication. In this article, I argue that this is equivalent to the use of emoji as a codified way of expressing affect in text-based online communication, and that these non-verbal sign systems can be understood as developments in the history of codified gestures for performance. Specifically, I look at hand signs as they are used on musical.ly, a video-sharing app that launched in 2014 and is popular among preteens and teens. Its users, or musers as they are called, share 15 second videos where they select a soundtrack and lip-sync or mime a performance. 1 The performances frequently include a rapid succession of hand signs that interpret and enhance the lip-syncing and that have evolved as a collective system of codes. Most of the hand signs enact or mimic the concept they reference, just as emoji tend to visually resemble the object or concept they refer to. For instance, a tilted head with a hand under one cheek signifies sleep, night or tonight, and a hand held in the shape of a gun signifies kill, death, or die. 2
I argue that hand signs on musical.ly are pictograms: visual representations of objects that signify a closely related concept, 3 and that they have developed as an augmentation of video-based online communication similarly to the way emoji developed as an augmentation of text-based online communication. The emergence of a gestural language on musical.ly may be an early example of a resurgence of structured gestural communication that will become more common with the rapid growth of video-based communication in social media.
While text-based messaging and social media hide the body and its gestures, visual social media such as musical.ly can make the user’s body visible to viewers. But although lip-syncers on musical.ly can use gestures and body language, they cannot use their own voice, and the app won’t let them add text to the video. I propose that when online communication genres limit the available modes of communication (for instance, to text-only with no images, or video-only with no sound or text), users will tend to compensate by developing shared conventions to codify elements that cannot otherwise easily be expressed in the limited modes available. This has led to the development of emojis and animated gifs as well as to hand signs on musical.ly.
A second argument in this article is that musical.ly lip-syncing videos are performances, not simply in Goffman’s generalized sense of all self-presentation being a form of performance (1959), but in the theatrical and musical sense of the word. Using the history of performance to understand performance on muscial.ly reveals a historical lineage for the hand signs in 18th century theater and classical oration, both of which also developed a codified system of hand signs to express emotion. Using this historical context to understand emoji allows us to expand Goffman’s notion of performance in a way that helps explain a key phenomenon in contemporary self-presentation in social media: the use of emoji and other pictograms.
My approach is twofold. First, I discuss how hand signs are currently used on musical.ly, supplementing the description and analysis of a typical example of a lip-syncing video with a discussion of YouTube tutorials created by and for musers explaining how to use hand signs on musical.ly. Then, I trace the historical precursors to the hand signs through two lineages. First, I discuss the remediation of gestures and affect that led to informal punctuation, emoticons, and emoji. Next, I discuss the history of communicative gestures, including informal everyday gestures, formalized gestural languages like semaphore, Plains Indian Sign Language and sign languages for the deaf, and most importantly for my argument about hand signs as elements in a performance, the history of codified gestures in oration and theater.
What is Musical.ly?
Musical.ly is an app where users can share 15 second lip-syncing videos and other short videos. Users can select soundtracks from the extensive music database available on musical.ly, use songs they have on their phone, record live sound, or use sound from other musers’ videos and skits. They can add visual filters and emojis to their videos like on other video- and image-sharing services, as well as apply less common time-based effects, such as a time loop, where you select a second of the video that is repeated a few times. Adding text is not an option.
The app was launched in Shanghai in October 2014 by Alex Zhu and Luyu Yang and has rapidly grown to become one of the most popular apps globally for tweens and young teenagers. On an average day in mid-2017, 13 million videos were uploaded to the app, which has grown from 10 million users in 2015 to over 200 million users in September 2017 (Herrman, 2016; Perez, 2017; Robehmed, 2017). The rapid growth of the app is all the more notable for its very specific demographics: it is extremely popular with tweens and young teenagers, but many adults have not heard of it—unless they have young people in their life who have shown it to them.
Ideally, this article would include videos showing how the hand signs are used, or at least provide references allowing readers to view the videos I discuss in full. I have not done this for practical, legal, and ethical reasons. It would be practically difficult, as musical.ly is a mobile-only platform with no permanent URLs for videos, there are legal issues involved, as recording videos is against musical.ly’s Terms of Service, and it would be ethically problematic as many if not most musers are minors, and given the mobile-only platform they reasonably expect a limited audience. I would suggest readers either search the musical.ly app themselves or look at compilations posted on YouTube. One example of a popular muser who uses hand signs in lip-syncing videos is Marco Cellucci, an French-Italian 14-year-old boy who has over half a million fans on musical.ly and has compiled a video of some of his best lip-syncing videos (Cellucci, 2016). Nina Schotpoort is an 11-year-old Dutch girl who also has more than half a million followers on musical.ly as well as running her own YouTube channel. Some of her YouTube videos show her in musical.ly lip-syncing duets with her fans (Schotpoort, 2017a) or showing users how the interface works (Schotpoort, 2017b). In the next section, I will analyze one anonymized lip-syncing video in detail and then discuss two hand sign tutorials posted to YouTube (Baby Ariel, 2015; Nigerias Blessing, 2015).
Hand Signs on Musical.ly
A typical example of how hand signs are used on musical.ly is shown in Figure 1, which shows selected frames from a lip-syncing video of a 15 second sample from the song “Too Good” by Drake and Rihanna. It was made by a young girl in Norway in August 2016, and I found it by scrolling through the most recently posted videos from the town I was in at the time. 4 Hand signs on musical.ly usually only use a single hand (as the other hand holds the camera) and they almost always replicate or enact selected words in the lyrics, as you see in the video shown in Figure 1. For the first phrase in the lyrics, “I’m way too good for you,” the muser first holds up two fingers to signify “too,” then quickly shifts to a thumbs up for “good,” back to the two fingers up again to signify “to” then points at the camera for “you.” The whole sequence takes less than a second.

Sketches showing a sequence of hand signs used in a lip-syncing performance of the lyrics “I’m way too good for you.”
In the second part of the video, shown in Figure 2, the muser doesn’t use hand gestures for every word, as in Figure 1. The lyrics for this part of the video are a little more abstract: “You take my love for granted/I just don’t understand it.” In the first line, the muser chooses to sign “you” (index finger pointing at camera) and “love” (hand held in the shape of half a heart). The last line is shown in just one sign: the finger points to the side of the head.

These sketches show the hand motions used in the second part of the musical.ly video. The lyrics for this sequence are, “You take my love for granted/I just don’t understand it.”
Hand motions on musical.ly are almost always direct interpretations of words in the lyrics. Sometimes, as with the use of two fingers for “too” and “to,” the signs refer to homonyms of the word used–although we could also interpret the two-fingered sign as signifying a “2” as often used in texting, rather than signifying “too” and “to”: “I’m way 2 good 2 you.”
A musical.ly lip-syncing video is a performance. The lip-syncer uses the soundtrack to perform the sound much in the same was as a musician reads sheet music to play a tune. The muser listens to the music and the words, practices singing along, and chooses which words to represent visually through hand motions. Although most musers tend to use the same set of signs (some of which are listed in the appendix), there is some leeway for personal interpretation. In Figures 1 and 2, this is evident in the selection of which words to sign and which not to sign, but also in the conclusion of the video, which is without words. After the part of the video shown in Figure 2, the muser puts her hand in front of the camera, making the screen go black. Then the video cuts to a slow-motion sequence of her first looking down, then smiling, her hand moving in toward her chin as though she is going to lean on it. She appears to be about to blow a kiss to the viewer (or to her own image on the front-facing screen of her mobile phone camera) when the video ends, or rather, loops back to the beginning.
Most hand signs used on musical.ly are directly representational: they refer to a specific word in the lyrics (see the appendix for a list of some common hand signs). There are also some signs that are non-representational, or that function more as a dance, or to emphasize the beat of the song. For instance, when moving the camera (which is done almost as a form of dance move), many users hold their left hand at the bottom of the frame, palm up, and little finger slightly raised as they move the hand from right to left. In a tutorial posted to YouTube about musical.ly hand motions, muser Nigerias Blessing explains the importance of camera motions (Nigerias Blessing, 2015). Camera motions certainly enhance the hand signs, and the ability to use camera motions effectively is a marker of real skill and prestige. In her video, Nigerias Blessing explains how the camera often should move in the opposite direction of the hand sign. Certain hand signs, in Nigerias Blessing’s view, need to be combined with appropriate camera motions. For instance, the sign for “drop” involves holding your hand with fingers down, tight together, as though holding something, then opening the fingers as though dropping that thing. Blessing suggests holding the hand quite high, and keeping it high throughout the sign, while moving the camera down. She embellishes the “shoot” sign by shaking her camera a little bit, mimicking the rat-tat-tat jolts of a series of bullets being shot. For hitting, she makes a fist, aims it at the camera, and moves the fist and camera toward each other as though they are about to collide. The integration of camera movements and gestures are a fascinating difference between these hand signs and hand signs used in face-to-face communication or in stage performances.
Nigerias Blessing’s video is the most thorough tutorial I have found on musical.ly hand signs. Other popular tutorials tend to be far less specific. Nigerias Blessing’s tutorial has by mid-2017 been watched almost 300,000 times on YouTube, although she is not a musical.ly superstar, with less than 2,000 followers on muscial.ly. Baby Ariel, on the other hand, is one of the most popular users on musical.ly, with over 20 million followers (Sherman, 2017). By the age of 15, she had established herself as a professional social media influencer based on her success on musical.ly. Several of her YouTube videos give users advice on how to make good musical.ly videos. Baby Ariel’s (2015) advice is less specific than Nigerias Blessings’ advice: I shake my camera a lot. I don’t know how I do it—I . . . shake it [looks into her mobile camera while gently shaking it] . . . like, really softly, but it looks like, bah! [Looks into camera again, moving camera sharply]—and it turns out good!
In another video, Baby Ariel (2016) teaches her mother to make a musical.ly video. Here, she explains in detail how to move the camera either in the opposite direction of the hand or following the hand.
Camera movements are, as these tutorials show, an important part of the musical.ly aesthetic, but appear not to signify directly on their own, as many of the hand motions do. Instead, the camera movements “look good,” as the tutorial makers say, often visibly searching for more specific words as they speak but ending up with “good.”
This ability to make it “look good” clearly involves a fair bit of skill and experience. It is not a skill that can be learnt simply from watching other videos or tutorials, or by your friend showing you. You need to practice a lot in order to succeed. Baby Arial’s video of her mother learning to use the movements demonstrates this by showing how inept her mother’s first attempts are, and by showing that although the mother improves with practice, she still lags far behind her daughter’s skill level.
The skill required to understand hand signs on musical.ly is not only on the part of the performer (the lip-syncer) but also on the part of the viewer of the lip-syncing video, who has the pleasure of interpreting the hand signs. In this sense, the hand signs can be seen as a puzzle, or more directly, a rebus, as journalist Clive Thompson (2013) has argued of emoji. Thompson notes that the uninitiated often find emoji to be annoying and argues that this annoyance or disdain also applies to rebuses, citing linguist Michael J. Preston, who in a paper on rebuses and graphic riddles wrote that “Just as a pun is conventionally met with a groan, the rebus is often acknowledged by a statement of disdain, unless, of course, one knows a rebus or two and can respond in kind” (Preston, 1982, p. 119). In their paper on emoji, Stark and Crawford (2015) reiterate Thompson’s point. This disdain is similar to the reception lip-syncing videos receive when spread beyond specialized sites like musical.ly. Luckily for lip-syncers, those who would disdain the puzzles of hand signs do not tend to spend much time on muscial.ly, though no doubt, as lip-syncing becomes more mainstream it will be mocked more, following the pattern we already know from blogs, selfies, and other forms of self-representational digital culture (Burns, 2015; Rettberg, 2014, pp. 17–19).
The History of Pictograms in Online Communication
The term remediation comes from Jay Bolter and Richard Grusin’s book by the same name (Bolter & Grusin, 1999). They discuss how every new medium remediates older media by paying homage to, rivaling, and refashioning them. Following this, I am using the term to argue that new modes of communication remediate old modes of communication. Personal letters remediated oral communication, email remediates personal letters, and so on. Non-standard punctuation and drawings were typical features of personal correspondence and informal letters and notes to friends and were used as a way of “encoding affect in written code” (Kataoka, 1997, p. 105). Italics, capitalization, repetitions, and ellipses are also examples of how writers encode affect and the immediacy of an oral communication situation into writing (Lakoff, 1982). In this way, informal writing remediated oral conversation. Email writers used the same techniques and developed new ones, adapting typography to create emoticons like the smiley face :-) and other combinations of punctuation marks (Rezabek & Cochenour, 1998). Toward the end of the first decade of the 21st century, emoji were introduced, and have largely, though not entirely, supplanted emoticons. Emoji began as proprietary symbols on specific Japanese platforms, but in 2010, emoji were added to Unicode standard version 6.0, meaning that they work the same as letters in the alphabet or punctuation marks, and can be represented in many different fonts and on different platforms. Many emoji remediate emoticons, like the smiley face☺, but there are also many new emoji, some referring directly to a physical object, like the coffee cup, and others gaining different meanings in different communities (Miller et al., 2016).
Around the same time, animated gifs became an increasingly popular addition to textual communication. The use of animated gifs mixed into verbal communication is a phenomenon that can be seen as a bridge between emoji and hand signs in musical.ly videos. Response gifs are often gestural and are used as supplements to and comments on verbal communication, for instance, in messaging or on Tumblr, and they add a layer of affect to the verbal text (Miltner & Highfield 2017; Highfield & Leaver, 2016; Tolins & Samermit, 2016). Although new response gifs are constantly being created, response gifs are also becoming more codified as platforms allow users to search for gifs by typing in the emotion they want to express.
Until recently, online communication was primarily written. Images and other visual aspects were secondary; they complemented, responded to, or expanded upon the written text. Platforms like YouTube, Snapchat, Instagram, and musical.ly invert this relationship between text as dominant and image as complementary by making the visual primary. Snapchat is interesting in this context, because it is not primarily verbal, but emoji are still heavily used as overlays to images and videos, or as part of a written caption superimposed on the image. Figure 3 shows two examples of how emoji are used on Snapchat. On the left, we see how the social media celebrity DJ Khaled uses emoji similarly in his snaps to in his tweets. He also often posts images with a caption containing no text, only one or more emoji. He rarely uses emoji to augment the image directly, as the Norwegian social media influencer Supermarie does in the snap on the right. Here, she uses several different techniques to augment her image. A selfie lens gives her face an animal nose and ears. She has added a caption, stating (in Norwegian), “I AM TOO YOUNG TO DIE” all in capital letters, but followed by three laughing emoji (“face with tears of joy”). She has also used emoji outside of the caption field: there are five “fire” emoji placed to the upper left of her face. Taken together, this is a complex multimodal message suggesting perhaps that she is exhausted, or that something difficult has happened, but also that she has a sense of humor about it. Notice too that both DJ Khaled and Supermarie have included emoji in their user names.

On the left, a snap by the celebrity DJ Khaled (@DJkhaled) shows a photograph of a necklace, with a caption including two key emoji and two praying hands emoji. On the right, a snap by Norwegian @supermarie showing her with closed eyes and a selfie lens filter on her face, a caption stating “I am too young to die” with three “face with tears of joy” emoji, and five fire emoji in varying sizes positioned above her face. Screenshots from Snapchat 9 December 2016.
Musical.ly videos do not include writing, but there is verbal content: the words sung that the muser lip-syncs to. Lip-syncing is of course not only a digital activity. In addition to being a long-established amateur pastime, lip-syncing has a strong tradition in drag and queer culture (Kaminski & Taylor, 2008). While gestures and facial expressions have of course always been integral to lip-syncing and miming, they have not typically been used as on musical.ly, where hand signs signify specific meanings that are recognized by a wide community. However, gestures are common in oral, face-to-face communication, and there are many gestures that have fixed meanings that can be codified. While sign language for the deaf is the most obvious example, it differs substantially from the hand signs used in musical.ly because it is an independent language where gestures are the primary mode. To understand hand signs in musical.ly as pictograms that support and expand upon another dominant mode of communication, it is more helpful to consider them in relation to informal conversational gestures and to the codification of gestures in oration and theater.
Communicative Gestures
Gestures are fundamental to human communication. Most humans can manage some basic communication without words, for instance, when trying to buy an item in a shop in a country where we do not know the language.
Gestures are often made unconsciously and may still communicate. We may slump when we are feeling dejected or raise our eyebrows slightly upon seeing somebody we like. In his book Bodytalk, Desmond Morris catalogues many gestures, both those made almost unconsciously and those made deliberately, noting their meanings and the localities where they are used. For instance, Morris lists standing arms akimbo (“hands on hips so the elbows protrude from the body”) and explains that this gesture is used globally to signify “Keep away from me!” and in Malaysia and the Philippines can also be used to signify seething rage (Morris, 1994). Some gestures are always made deliberately, like the “rude finger” or digit impudicus, which is also an example of how a gestural sign can have extraordinary longevity: Aristophanes punned upon it in The Clouds, written 423 years BC. There is documentation of its use in Roman times, and it is still common today (Robbins, 2007).
Sign languages can also be complete languages, with morphologies and grammatical rules of their own. Before developing spoken language, early Homo sapiens used gestures to communicate, possibly with quite complex syntax (Corballis, 2002). We know that structured sign languages existed well before modern times to complement or replace speech, such as the Plains Indian Sign Language, which the Arapaho, Cheyenne, Lakota, Blackfoot, Comanche, Paiute, and Crow peoples in North America used to communicate across different language communities and for ritual purposes (Carayon, 2016). The most common formalized sign languages today are sign languages for the deaf, which began to be formalized in the 18th century (Knowlson, 1965) and are now used all over the world as national sign languages with complete grammars and morphologies (Stokoe, 2005, original publication 1960). Semaphore (which can be used with hands as well as flags) and Hindu dance gestures are other examples of specialized gestural languages.
On musical.ly, lip-syncers generally use signs that are already common in everyday life, such as holding up fingers to signify a number, or signs that mime a meaning, much as we do when trying to communicate without words in a face-to-face situation. For instance, if we were trying to express that we were thirsty or to offer somebody a drink, without using words, we might curl the fingers as though holding a glass or bottle, and move this in a “drinking” gesture near the mouth to mime the motion of drinking. If we were struggling to hear somebody in a crowded nightclub, we might cup a hand to our ear and lean forward. Signs like these are equivalent to pictograms in writing: they visually mimic that which they represent, much as emoji do.
Multimodality: Words, Images, and Gestures on a Screen
With touch screens, we have become increasingly accustomed to using gestures as codes or signs that have very clear meanings. We shake our phones to undo a mistake and swipe down when watching a story on Snapchat to go back to the overview. Sometimes these gestures of interaction go beyond mere navigation and can be seen as what Ian Bogost (2007) calls procedural rhetoric: “persuading through processes in general and computational processes in particular” (p. 3), as has been argued of the swipe right to dismiss an unpleasing suitor on Tinder (David & Cambre, 2016). Cultural practice is evolving to incorporate these new modalities, both in vernacular communication and artistic genres, but we still lack a sophisticated rhetorical understanding of gestures of interaction (Verhulsdonck & Morie, 2009).
In her analysis of kinetic poetry and other forms of electronic literature that use the affordances of digital media, Alexandra Saemmer (2013) proposes that we consider combinations of gestures and words or images to be pluricode couplings, which “involve two different semiotic systems, a text and an icon, within the same active support of the sign.” In the works of electronic literature that Saemmer discusses, the gestures can either be movements the reader must make in order to access the poem (touching the screen, clicking a link, dragging a word) or movements made by the words and letters on the screen. Semioticians have used the term multimodal for some time to describe a text that uses several modalities, for instance, an ad in a magazine that uses both text and image, or a movie, where moving images are combined with sound (Kress, 2010).
In a muscial.ly lip-syncing video, I propose that we think of the hand signs as an independent modality. Thus, we can identify at least four distinct modalities: the music, the lyrics (the words sung being a linguistic modality), the moving image, and the hand signs.
The image in a magazine ad illustrates the text and expands the meanings and connotations of the ad as a whole. The relationship between the lyrics and the hand signs in most musical.ly lip-syncing videos is even closer than the relationship between image and text in an ad. Rather than expanding the meaning, the two modes of words and signs repeat each other. In Alexandra Saemmer’s words, there is a coupling between word and hand motion. The word “love” and the hand held in the shape of half a heart against the chest are a pluricode coupling. In a sense, then, the hand motions are redundant. They simply repeat the meaning that is already conveyed in the lyrics.
Lip-Syncing as Performance
If we instead think of the hand signs as an enactment or a performance of the lyrics rather than as a representation in itself, the apparent redundancy makes more sense. Young people have listened to and performed the popular music they love for generations. Pre-teen girls in particular perform, adapt, and share the music they love privately and with friends (Baker, 2001). The mobile phone is a perfect companion to such bedroom culture (Kearney, 2007), as it is both an intimate device and a communication device. Pre-teen culture can also affect adult culture, as Kyra Gaunt (2012) has shown in her study of how pre-adolescent girls’ game-songs have influenced hip hop music.
What has changed with apps like musical.ly is (1) the scale of the audience with which young people can share their performances (potentially to the world, not just their classmates and neighbors); (2) the scale at which they can access other versions of performances of the same music; and (3) the fixity of the media they use, which allows them to record and edit their performances far more easily than earlier generations could. It is not surprising that a global platform for sharing the kind of performances of popular music that tweens and teens for so long have engaged in would be hugely popular.
A lip-syncing video with hand movements can be viewed as a performance of a set text, much as a musician plays a tune from sheet music, an actor acts from a script or a parent sings a popular lullaby. Certainly, interpretation and individual choices are involved, but these performances can generally be recognized as performances of an already established text. The musical.ly app encourages this by allowing users to see many performances of the same song. By tapping the small circle in the lower-right hand corner when watching a video, we arrive at a page that shows information about the song that was performed, with thumbnail videos of other performances of the same video (see Figure 4).

The page for an individual song includes moving thumbnail videos of its most popular or most recent lip-syncs beneath. The top three shown here are musical.ly celebrities @lisaandlena, @babyariel, and @mackenzieziegler, who all have several million followers.
This availability and even celebration of multiple interpretations of a set text is an established pattern of the internet today. The set text need not be an image or a song. Knitting is another creative, everyday activity that has been revolutionized by the Internet. Like lip-syncers, who used to perform in their bedrooms or on the playground, knitters used to knit at home or with close friends. With platforms like Ravelry.com, knitters can share their projects with a far greater audience and connect images and descriptions of each sock or sweater they knitt to the patterns the items are based on. This creates a database where you can often see several hundred different ways in which a single pattern has been worked up, and you can compare how various knitters have adjusted the pattern to their liking by using different colors, different yarns, or changing the pattern. Before the Internet, in the age of mass media, knitters usually only had access to a few local friends who knitted, and to the official patterns published by yarn companies and sold in yarn shops. With the Internet, creativity blossomed in knitting, not least because of the possibility of seeing hundreds of different ways that a single pattern could be knitted. Memes are another creative area where we see many interpretations of an original, and where databases, such as Knowyourmeme.com, allow users to browse through catalogues of variations on a shared original. These modes of creativity encourage a thinking where the original pattern or meme is not something to be revered, but something to be developed and made one’s own.
On musical.ly, creativity is similarly encouraged by the constant invitation to create your own version. When you watch somebody else’s video, you have the option to “start duet now!” which means making your own recording of the same song, which musical.ly will edit together with the first one you saw to create a duet. Or you can click the circle in the lower-right corner to see information about the song, other examples of videos made for it, and a tempting, bright yellow button labeled “shoot now!” inviting you to create your own version. This is an environment designed for remaking and adapting other peoples’ content, and thus an ideal environment for the rapid development of a shared gestural language.
Chironomia: Codified Gestures in Performance
Gestures have been studied for their role in supporting verbal communication since the ancient Romans, at least, when Quintilian and others described how orators used gestures and bodies to strengthen their message. Cicero coined the term chironomia in his De Oratore (55 BC) for the study of non-verbal communication through hand and arm gestures that accompany speech (Verhulsdonck & Morie, 2009).
In the 17th and 18th centuries, scholars studied and analyzed gestures of actors and produced dictionaries and systems for understanding these gestural languages, describing the field as chironomia or chirologia (Austin, 1806; Bulwer, 1644). These codified gestural languages for theater are in many ways similar to the hand signs on musical.ly. Some of the same gestures are used in musical.ly videos, such as the first gesture shown Figure 5, a finger held up to the lips to signify silence, documented in Andrea de Jorio’s (1832) book on everyday Neapolitan gestures. The theatrical gestures described by Austin and Bulwer were specifically for actors to use to accompany spoken words in a performance. They are an encoding of affect, similar to emoji, emoticons, response gifs, and hand signs on musical.ly.

These 19th-century Neapolitan gestures are from a 19th-century book (de Jorio, 1832, p. 427) and signify as follows: 1. silence; 2. no; 3. beauty; 4. hunger; 5. to mock; 6. weariness; 7. stupid; 8. squint; 9. to deceive; 10. cunning.
Such formalized gestures fell out of fashion in theater with the realism of the 19th century. Giorgio Agamben argues in his “Notes on Gesture” (Agamben, 2000) that it was photography that killed gestures, not simply in theater but in society in general, by freezing them in time. “By the end of the nineteenth century,” Agamben writes, “the Western bourgeoisie had definitely lost its gestures” (p. 49), and in silent movies and a few other genres, “humanity tried for the last time to evoke what was slipping through its fingers forever” (p. 53). Agamben sees photography as killing gestures by locking them into still slices of time: “images are the reification and obliteration of a gesture (it is the imago as death mask or as symbol)” (p. 54). Cinema, on the other hand, has its center in the gesture, Agamben writes. While this is a seductive line of argument, I must point out that a freezing of gestures was achieved by the detailed drawings of gestures in books on chironomics (as shown in Figures 5 and 6) well before photography was invented. The urge to freeze gestures and categorize them may be more related to the Enlightenment desire for categorizing the world with dictionaries and encyclopedias than to the 19th-century technology of photography, although it certainly became more ubiquitous with photography. Perhaps it is also significant that photography froze the actual gestures of individual people, whereas the drawings of earlier years were more generalized and abstract.

A chart from John Bulwer (1644) book Chirologia, showing different hand gestures to be used in oration, with their meanings.
Different communication technologies emphasize or obscure gestural communication. Photography and drawings freeze gestures. The predominantly written communication of the early years of the Internet hides gestures altogether, although people tried to reinscribe gestures into their writing through emoticons, non-standard punctuation and spellings, and verbal descriptions. Smartphones with cameras and increased broadband access made images central to social media from around 2008 and onwards (Rettberg, 2014, p. 3), but following Agamben, these still images would only have frozen gestures into death-masks. However, video-based communication like musical.ly makes gestures central again.
Agamben sees cinema as a technology that rekindled gestures in our culture. Silent cinema in particular required large and expressive gestures, as it had to do without sound, and verbal language was limited to short captions (not unlike Snapchat). Now, when video is gaining importance in online communication, it makes sense that gestures are also regaining prominence. Importantly, gestures on musical.ly are codified, deliberate representations rather than the largely unintended, or at least unplanned, gestures of face-to-face communication. Like emoji and response gifs, and like the codified and exaggerated gestures of oration, pre-realistic theater and silent cinema, they are codified versions of the gestures we use in face-to-face communication. Hand signs on musical.ly are not a language intended to replace verbal communication, like sign languages for the deaf or semaphore. Hand signs on musical.ly augment verbal communication, lending emphasis to the words that are sung in the music, providing anchorage as Roland Barthes (1977) might have said (pp. 35–37). They are intended to enhance performance, just as the hand gestures catalogued by Bulwer were intended to be used by actors and orators performing to an audience.
Conclusion
Hand signs, emoji, and other pictograms are the current instantiations of a tendency in human communication that can be traced back to the orators in Ancient Rome. When ancient orators or 18th-century actors stood on a stage, and were only seen at a distance by their audience, they compensated by developing a set of codified gestures to amplify and interpret the words they spoke. In musical.ly lip-syncing videos, we see the muser up close but on a tiny screen, and we do not hear their voice. They compensate by developing a set of codified hand signs to amplify and interpret the words sung in the song they are lip-syncing. When people write online, the tone of their voice and their body language is not visible to their readers. They compensate by developing a set of codified emoji to amplify and interpret the words they type.
Until now, we have thought of emoji and other forms of non-standard punctuation that are used to enhance writing in digital media as being a way to add aspects of non-verbal speech that are necessary to conversation to a text-only medium. But if, as I have argued, hand signs on musical.ly are analogous to emoji in texts, we need to rethink the way we understand not only hand signs but also emoji. Emoji are simply one of many possible kinds of pictogram that have developed and will be developed as part of online communication. If emoji developed to augment textual communication, and hand signs are developing to augment video-based communication, we could speculate that other codified sign systems will develop when other modalities become dominant, at least if communication is limited as in text or musical.ly so that only some modalities are available. Both emoji and hand signs are remediations (Bolter & Grusin, 1999) of the gestural communication that is so fundamental to human conversation. As video and other visual forms of communication become more common online, we will continue to see how the human need for gestures leaks into digital communication.
Footnotes
Appendix
List of Selected Hand Signs Commonly Used on Musical.ly With Their Meanings
| Bed/sleep/night | Tilt head, leaning against hand, palm away from head |
| Cold | Wrap hand around self, shake camera in shivering motion |
| Come here | Beckon with index finger |
| Cry/tear | Touch finger to cheek, pull down as though a tear is running down cheek |
| Die/death | Make a gun from thumb and two pointed fingers and point it at your head, or hold hand horizontal and slash across throat |
| Drink | Hold hand to mouth and move as though drinking from cup or bottle |
| Drive | Hand on imagined steering wheel, move back and forward |
| Face | Hand moves down against side of face, palm toward face |
| Half/middle | Hold hand flat and vertical, 90° to face, thumb toward face, little finger to camera, and (optionally) move up or down, moving camera in opposite direction |
| I/me | Thumb points to self |
| Lie | Fingers up, palm toward face, move hand down in front of face while wiggling fingers as though in intricate pattern |
| Look | Hand held horizontally above eyes |
| Love | Make the shape of half a heart with your fingers and thumb |
| Money | Rub fingers together as though holding paper money between them |
| No | Shake head or hold index finger up and shake back and forward |
| Numbers | Fingers held up, palm facing self |
| Peace | Peace sign (two fingers held up, palm facing camera) |
| Pray/hope/miracle | Same as “half/middle,” although here the reference is of two hands held palm to palm as though in prayer |
| Run/leave/go | Move fingers as though running |
| Sing/talk/say | Hold hand in front of mouth and move thumb against fingers to mimic a mouth opening and closing to speak |
| Stressed out | Hands against top of head, upset expression on face |
| Take | Begin with outstretched open-palmed hand, move toward self while closing hand |
| Think/wonder | Tap one or more fingers lightly against side of head |
| Time | Hold forearm horizontal and look at wrist as though looking at a watch |
| You | Index finger points to camera |
Acknowledgements
The author thanks Aurora Goga for her hand-drawn renditions of the lip-syncing video shown in Figures 1 and
. An early version of the paper was presented at the workshop “Social Media as Semiotic Technology”, organised by Søren Vigild Poulsen and Gunhild Kvåle at the University of Southern Denmark. The author is grateful for feedback and encouragement from participants in the symposium, and to feedback from Social Media + Society’s peer reviewers.
Notes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research for this paper was done as part of the author’s job at the University of Bergen, for which she received a salary. The final revisions were completed while the author was a visiting scholar at MIT, with a stipend from the Meltzer Foundation.
