Sage Journals: Discover world-class research

Abstract

This article presents a multimodal rhythmic analysis of a TikTok video, adopting a social semiotic perspective on embodied meaning-making. We highlight the importance of rhythm in coordinating and intertwining semiotic modes to produce meaning. The study develops a method for undertaking an integrated multimodal analysis of rhythm across speech, bodily action, gesture and music, and develops a transcription convention for representing this rhythmic unfolding. The data considered is a TikTok ‘glambot/boss challenge’ video featuring a lip sync to audio sampled from former Australian Prime Minister Julia Gillard’s 2012 culturally iconic ‘Misogyny Speech’ condemning misogynist and sexist men, particularly those in positions of power. This speech achieved viral prominence internationally and continues to be a key feminist text in Australian political history. The paper demonstrates how end-accented rhythmic groups create anticipation and lead to the main event in both the speech and glambot sections of the video. Alongside the rhythmic analysis, the article examines the intertextual meanings established with other TikTok videos that iterate the glambot meme and glambot challenge.

Keywords

TikTok rhythm social media multimodality intertextuality

Introduction: Multimodal rhythm and TikTok

This article introduces a social semiotic method for analysing rhythm in TikTok videos. These videos are multimodal performances in which rhythm plays a crucial role in coordinating a complex orchestration of voice, music, images, gestures, and other semiotic modes. Rhythm is a natural, physiological, social, and semiotic phenomenon. A basic definition of rhythm is the alternation between two polar states, e.g., day-night, ebb-flow (van Leeuwen, 2005). Rhythmic alternations can be found in vital functions of the human body such as the cardiac cycle (diastole-systole), respiration (inhalation-exhalation) and hormone secretion (Bartenieff and Lewis, 1980). Other bodily rhythms can be found in speech (e.g., loud-soft, tense-lax) and movement (e.g., flexion-extension, tension-relaxation, fast-slow) (Bartenieff and Lewis, 1980; Van Leeuwen, 1992). Rhythmic alternation is not between two arbitrary poles – in bodily rhythms, they tend to be alternations between periods of exertion and recuperation e.g., work-rest, tension-relaxation, awake-asleep (Rudolf Laban in Bartenieff and Lewis, 1980: 71), and in semiotic rhythms they tend to be alternations of more or less prominence e.g., LOUD-soft, TENSE-lax (van Leeuwen, 2005: 182).

Rhythmic alternations are continuous and are therefore wave-like rather than discrete alternations between two states (cf. Sheets-Johnstone, 1979; van Leeuwen, 2005). In rhythms of semiotic events, continuous rhythms are segmented into temporal frames or “phrases” by discontinuities in the rhythm (van Leeuwen, 2005: 184). A result of segmentation of continuous rhythms into phrases is that the rhythm is divided into units with distinct beginnings and endings (Bartenieff and Lewis, 1980; Keller, 1973). Rhythms and phrasing can be analysed on multiple hierarchic levels of organisation and timescales e.g., speech rhythms can be analysed at the levels of rhythmic feet, rhythmic groups and rhythmic sequences (Van Leeuwen, 1992) and similarly, musical rhythms can be analysed at the levels of measures, phrases and phrase groups (cf. Cooper and Meyer, 1960; Keller, 1973).

Rhythms play an important coordinating and cohesive function, especially from a social semiotic perspective. Rhythms are important for coordination of our body parts to perform activities e.g., walking. They are equally important for the social coordination of multiple bodies in group activities, as exemplified by the following passage:

In a sequence from a prison work camp film, the exertion/recuperation spatial Effort rhythms of a work action were observed: Three men fell a tree with an axe. They strike with power, one after the other, leaving each man a time of recuperative and preparatory action. Thus, a particular triadic rhythmical sequence is established which allows each to contribute his Strength Effort maximally at the right moment. The action is accompanied by singing which reinforces the rhythm and keeps it going. (Bartenieff and Lewis, 1980: 73)

In a similar way, rhythm plays an important role in the cohesion of semiotic articulations that unfold over time and give them meaningful structure (van Leeuwen, 2005). In particular, rhythm plays an important role in multimodal texts in the coordination and cohesion of multiple semiotic modes and media, as will be demonstrated in the present analysis.

The method for rhythmic analysis developed in this paper aims to account for the complex embodied and digital semiosis that is interwoven in TikTok videos as multimodal performances. TikTok is a popular social media platform that allows users to create, share, and discover short videos featuring a variety of content. The platform’s algorithmic feed suggests videos to users based on their preferences and engagement patterns. TikTok videos incorporate multiple semiotic modes in its short-form format, such as music, visuals, dance, and gestures. The videos often include dance performances, catchy soundtracks and lip-syncing, creative use of captions and hashtags, interactive features like duets and comments, and concise storytelling within the platform’s short-form format. TikTok has a number of technical dimensions that enable its users to make rhythmic meanings via different modalities. For instance, beat synchronization allows users to create synchronized videos by aligning their actions, dance moves, or edits precisely to the beat of a song or audio clip. The platform's looping and repetition features also enable users to emphasize rhythmic patterns, reinforcing the catchiness and memorability of the content through repeated sequences, also supported by editing capabilities such as transitions and other kinds of visual effects. TikTok’s focus on choreographed dances facilitates the dissemination of viral dance challenges, which involve coordinated movements and specific rhythmic patterns that users learn and replicate.

Emerging literature within media and communication studies has begun to consider the kinds of digital practices that are developing on TikTok as a platform (Kaye et al., 2022: 169), including some work considering music challenges (Vizcaíno-Verdú and Abidin, 2022) and the collaborative production of duets (O’Toole, 2023). There have also been studies unpacking the implications of the algorithms which determine how users experience TikTok content (Siles et al., 2022), serving as “echoic mechanism [s]” influencing how voices are entangled, remixed, or ventriloquised (Ramati and Abeliovich, 2022: 18). While this work makes some acknowledgement of music, rhythm and beats as integral to TikTok performances there has yet to be studies that analyse the rhythmic meaning-making in technical detail. There has been some work on the related notion of tempo (how fast or slow that audio or movement is). For instance, Lu and Shen (2023) found that fact-checking videos on Douyin (the Chinese version of TikTok) tend to have a faster tempo than other kinds of videos. As we will discuss later, part of the reason for this scarcity besides the novelty of the platform is the degree of difficulty of accounting for multimodal rhythm, both theoretically, and in terms of devising transcription and annotation strategies.

Background

A single TikTok video is the focus of analysis of this paper. In order to understand this video, it is necessary to begin with an explanation of the relevant intertexts and generic structures that it draws upon. The video is by the user, @MinorFauna, posted on 31 March 2020, along with the caption:

After multiple requests, I bring you my take on the ICONIC ‘Misogyny’ speech by Julia Gillard with a #glambot twist. #bosschallenge #quarantine

This video is 42 s long and includes an excerpt of Julia Gillard’s Misogyny speech (explained in Australian Prime Minister Gillard’s ‘misogyny speech’ below), which the TikTok user lip-syncs whilst putting on make-up, accessories (Figure 1), and a coat. The speech is accompanied by the backing-track to the song, Boss Bitch by Doja Cat, and once the speech excerpt ends, a short excerpt of the vocal track to Boss Bitch plays. The video itself roughly follows the generic structure of the “glambot challenge” (aka “boss challenge”) on TikTok, which itself is derived from the glambot meme (see The glambot challenge below).

Figure 1.

A still from @minorfauna’s TikTok video.

This video was selected for analysis as it was the most prominent example of a snippet of Gillard’s speech being recontextualised for a mass audience on TikTok. The video is the most popular instance of this type of text and at the time of writing has received 263,000 hearts and 1602 comments from TikTok users.

Australian prime minister Gillard’s ‘misogyny speech’

The Julia Gillard Misogyny speech was a speech delivered by then Prime Minister, Julia Gillard, in Australian parliament, directed primarily at the then leader of opposition, Tony Abbott, on 9 October 2012. Gillard’s speech deflects accusations of sexism and misogyny from Abbott and turns these accusations back on to him. The speech itself is iconic in a contemporary Australian cultural context: it was voted as ‘the most unforgettable moment in Australian TV history’ in a poll conducted by the Guardian Australia in 2020, and since the speech was delivered in October 2012, it has been appropriated and intertextually referenced in many different multimodal forms, including as a choral arrangement of the speech, as an artistic interpretation in the form of a painting, and in protest placards. Across the variety of appropriations of the speech, it becomes clear the speech as a cultural artefact holds semiotic potential and has been used to articulate a variety of meanings based on this potential e.g., to bond over feminist values and/or to call out (sexist/misogynist) men in positions of power.

The glambot challenge

The TikTok video explored in this paper is an iteration of the glambot challenge on TikTok. A glambot is a video in which the subject poses as if they are on a red carpet, and the camera (that is recording the video) moves (e.g., toward the subject) whilst recording in slow-motion. This type of video was created by the photographer and film maker, Cole Walliser, and originated on Channel E in early 2020 as part of their coverage of Hollywood award shows such as the Grammys and the Academy Awards. Channel E’s original glambot footage from these award shows were then used in a meme which was first posted on TikTok on 10 February, 2020¹ and became a meme format with many iterations on TikTok. Some of the hashtags that were associated with this meme include #glambot and #bosschallenge. This meme shares humour about feeling poorly dressed (and/or judging other people for being too formally dressed) in particular social situations e.g., feeling underdressed arriving at school when all the girls arrive in fancy outfits. The meme format follows a generic structure outlined in Figure 2. In determining this generic structure we drew on a social semiotic perspective on genre as encapsulated in Martin and Rose (2008: 6) which conceives of genres as “as staged, goal oriented social processes”. The glambot footage is used in the third stage of this structure as a metaphoric hyperbole of being overdressed. The background music used in this meme format is the song, Boss Bitch by Doja Cat. Figure 2 provides examples of memes that follow this same generic structure, with a single screenshot for each stage of the structure.

Figure 2.

Generic structure of glambot meme.

By March 2020, a TikTok challenge developed from the glambot meme. The TikTok challenge involves TikTok users recreating the glambot at home with special effects and makeshift techniques e.g., instead of a camera with a robotic stand that moves automatically, the camera is tied to a rope and hung on a bar so that it can swing. This TikTok challenge arose in the context of the COVID-19 pandemic at a time when many people around the world were in lockdown or quarantining at home. Unlike the glambot meme, this glambot challenge was not so much about sharing humour, but rather about sharing positive feelings and uplifting low spirits during the pandemic. Thus, it makes meanings surrounding self-care and self-empowerment in helpless situations. In addition to the same hashtags that were also associated with the glambot meme (#bosschallenge, #glambot), the hashtag #glambotchallenge also began to be associated with this TikTok challenge. Like the glambot meme, the glambot challenge also uses Boss Bitch by Doja Cat as the background music, however the three stages of the generic structure are distinct. The glambot challenge begins with the original glambot footage from the award shows with a caption stating that the TikTok user attempted to recreate it at home, then moves to shots of the creative process of the makeshift glambot, and concludes with the end result, the glambot itself (see Figure 3).²

Figure 3.

Generic structure of glambot challenge.

Many examples of the glambot challenge can be found that closely follow the generic structure outlined, however there are also examples that slightly deviate from this structure, including videos that contain elements of both the glambot meme and glambot challenge. The TikTok video by Minorfauna under examination in this paper does not strictly follow the generic structure outlined, however it is still important to recognise that it is a variation on this same structure (see Figure 4). The video does not include the first stage of the generic structure (original footage from Channel E). Although there is no run through of the technical creative process of setting up the camera, there is a comparable stage in the structure which portrays the TikTok user’s preparatory process for the glambot i.e., putting on makeup, jewellery, and a coat for the glambot. The final stage of the structure is the same – the actual glambot itself with slow motion, special effects and red carpet posing. The same background music (Boss Bitch) is used, however the vocal track is taken out for the majority of the video and only added at the end after the Julia Gillard speech excerpt ends.

Figure 4.

Structure of Minorfauna’s video as a variation of the generic structure of the glambot challenge. Elements deleted from original generic structure crossed out. Added or changed elements in boldface.

Both the Julia Gillard speech and the generic structure of glambot TikTok challenge (including the background music) are used as semiotic resources in the video under examination in this paper. After introducing in the next section our method for analysing how multimodal rhythm can weave together these kinds of resources, in the section, Intertextuality and rhythm, we will consider how they are used to articulate meaning in the video.

A method for analysing multimodal rhythm

The social semiotic analysis of the TikTok video under examination involves firstly an identification of the semiotic resources that are used in the video and how they are interrelated in terms of rhythm. For multimodal texts, the most obvious way to do this is to identify the various semiotic modes that are used in the video. The semiotic modes used in this video can potentially be listed as speech (including lip-syncing), facial expression, gesture and bodily movement, music, visual effects (slow motion, zooming in), and written text (accompanying the video). Rather than analysing each mode separately in detail, a rhythmic analysis will allow us to analyse how multiple semiotic modes are combined and coordinated together into a single cohesive multimodal text to articulate meaning. In addition to conceptualising the multiple semiotic modes used in the TikTok video as semiotic resources, it is also possible to conceptualise the pre-existing texts that are referenced and/or incorporated into the video, as well as the generic structures that are followed in other related texts as semiotic resources.

Rhythmic analysis

This section provides both an outline of the social semiotic framework of multimodal rhythmic analysis that is used, and a rhythmic analysis of the video itself. As we mentioned in the introduction to this paper, a multifaceted phenomenon, encompassing natural, physiological, social, and semiotic dimensions.

Speech rhythm

To demonstrate the principles of rhythm, let us begin with an analysis of the speech rhythm using nomenclature and a notation system based on van Leeuwen, 2005 framework. We begin at the lowest hierarchic level of analysis: the measure. Each measure is isochronic (i.e., roughly of same duration) and contains no more than one “pulse”, a syllable that is “stressed” (i.e., made prominent, accented, emphasised etc.) by means of acoustic cues such as loudness, duration, tenseness etc. It is important to note here that stress/prominence is a perceptual phenomenon and must be subjectively analysed and cannot simply be analysed instrumental measurement alone (Van Leeuwen, 1992: 232). The first line of the speech in the video can be analysed into measures as follows:

I will/ not be/ lect ured a/ bout / sex ism/ and mis/ og yny/ by / this / man / I / will / not /

In this first line, the final six syllables are each given stress (this is referred to as monosyllabic stress). Measures can be grouped together into phrases by discontinuities that segment the rhythm:

[ I will/ not be/ lect ured a/ bout //] [ sex ism//] [ and mis/ og yny//] [ by / this / man //] [ I / will / not //]

Just as stress/prominence is a perceptual phenomenon, so is phrasing, and must be analysed by listening. Now that speech has been segmented into phrases, it is possible to identify the main pulse of each phrase i.e. the pulse that is made more prominent than the other pulses in the phrase:

[ I will/ not be/ lect ured a/ bout //] [ sex ism//] [ and mis/ og yny//] [ by / this / man //] [ I / will / not //]

Phrases can then be grouped together into communicative “moves”, which are demarcated by rhythmic discontinuities that are relatively more pronounced than ordinary phrase boundaries (e.g., by a slightly longer pause). And finally, each move can then be analysed as having a main phrase – a phrase made more prominent than the others:

[[ I will/ not be/ lec tured a/ bout //] [ sex ism//] [ and mis/ og yny//] [ by / this / man //] [I/ will / not //]]

[[^/ And the/ gov ernment will/ not be/ lec tured//] [^ about/ sex ism//] [ and mis/ og yny//] [ by / this / man //] [ not / now //] [ not / ev er//]]

[[^ The/ lead er of the/ op po/ si tion/ says //] [^ that/ peo ple who/ hold / sex ist/ views //] [ and who/ are mis/ og ynists//] [^ are/ not ap/ pro priate for/ high / of fice//]]

[[^ well I/ hope the/ lead er of the/ op po/ si tion has/ got a/ piece of/ pa per//] [^ and he is/ wri ting/ out his/ res ig/ na tion//] [^ because if he/ wants to/ know what mi/ sog yny/ looks like in/ mod ern Au/ stra lia//] [^ he/ doesn’t / need a/ mo tion in the/ house of/ re pre/ sen tatives//] [he/ needs / a / mi rror//]]

The following observations can be made about the speech rhythm of the example:

• The word ‘not’ is frequently made the main pulse of phrases.

• In moves 1 and 2, sexism and misogyny are given individual phrases – they are demarcated by the phrasing as important words in the speech and given equal status as other entire phrases.

• The main phrase of each move is at the end, rather than the start of each move. Thus each move builds up to the most important moment. Monosyllabic stress is used in at the end of three of the moves for clear emphasis.

Generalising about rhythmic analysis, at each level of analysis the rhythm is segmented into communicative units such as phrases or communicative moves, and in each of these units, a particular moment is made prominent, which marks informational importance. The notation system that has been used thus far is suitable notating speech rhythm, however here we introduce an alternate notation system (Figure 5) that is more flexible, based on the notation system of musical rhythm used by Cooper and Meyer (1960). This notation system can be used across different semiotic modes and allows for any number of hierarchic levels of analysis.

Figure 5.

Alternate rhythm and phrasing notation system, adapted from Cooper and Meyer (1960).

Rhythm of bodily actions

Thus far, we have outlined an analysis of the speech rhythm and phrasing in the TikTok video. There is however, more to the rhyhtm of the video than the speech alone. The video continues after the speech excerpt ends, and there are other semiotic modes that contribute to the rhythm. To fully explore rhythm in this multimodal text, we must also analyse other rhythmic elements and how they coordinate each other and with each other. Turning to bodily movement, there are many elements with alternations that could be described as rhythmic. These include intensity of movement (vigorous∼non-vigorous), movement of upper body along the sagittal plane (leaning forward∼neutral position), openness of eyes and eyebrows (wide open∼neutral openness), verticality (standing up∼seated), and eye contact (direct contact∼looking away). Vigorous gestures, leaning forward, direct eye contact, openning of eyes and standing up are all aspects of movement with strong meaning potential. In the context of this video, such movements can be interpreted as making “confrontational” meanings: the TikTok user, invoking Julia Gillard, appears to be physically and verbally confronting Tony Abbott and the Abbotts of the world, challenging them on their sexism and misogyny. The rhythmicity of these movements makes the movements more dynamic as they are constantly being renewed. Whilst these alternations can be described as rhythms in and of themselves, they also contribute to more complex rhythms and phrasing of the multimodal text. For instance, these movements are used to reinforce moments of prominence in the speech rhythm, and are particularly salient when reinforcing moments of monosyllabic stress:

• On the main phrase at the end of move one in the speech rhythm, [ I / will / not //], the TikTok user makes a vigorous pointing gesture synchronised with monosyllabic stress of the phrase

• In move 2, on the main pulse of [ by / this / man //], the user opens the lip balm vigorously whilst opening her eyes and eyebrows wide, then on the main phrase, [ not / now //], she makes a vigorous, synchronised pointing gesture with the lip balm, and finally on the final phrase, [ not / ev er//], she leans forward on the main pulse.

• On the main (and final) phrase of the final move of the speech, [he/ needs / a / mi rror//], the user begins leaning forward on the first word of the phrase (she also makes a quick, sharp pointing gesture on this word) and keeps leaning further forward until mirror. It is also worth noting she is looking forward and standing up (she stands up at the beginning of the final move).

These elements of bodily movement also contribute to the rhythm and phrasing of the series of actions the TikTok user makes throughout the video. The rhythmic analysis of the actions at the lowest hierarchic level is provided in Figure 6. Most of these actions relate to getting ready for the glambot – putting on make up, jewellery, accessories etc., Some of these actions also relate to gestures and actions that contribute to reinforcing the speech rhythm as discussed above, e.g., when the user makes a vigorous pointing gesture. In many instances, there is an overlap of functions and meanings of actions e.g., when the user leans forward to brush her eyebrows, this is not only an action that relates to the glambot preparation, but it is also coordinated with the accent on the word, sexist in the speech. Examining the phrasing, each phrase groups together actions for a single piece of make-up or accessory (e.g., earring, lip balm, coat). The most prominent moment of each phrase is either the main action involved in applying the piece of make-up/accessory, or is a gesture or movement that functions primarily to reinforce speech rhythm accents. This analysis demonstrates that the rhythms of speech and bodily action are intertwined.

Figure 6.

Rhythm and phrasing of bodily actions at lowest hierarchic level of analysis.

The rhythm and phrasing of the actions can also be analysed at higher levels of organisation. The sequence of actions leading up to the glambot can be divided into two sections at the next level of organisation (labelled as II in Figure 7). The first section consists of all the actions done whilst seated, and the second section consists of those done standing and is therefore the prominent section. The standing section is shorter, but it consists of actions relating to the final preparations before the actual glambot posing. On an even higher level of analysis (III), the entire sequence of actions can be interpreted as a prepatory phase which is followed by the main phase, the actual glambot. To fully appreciate how this highest level of rhythmic analysis works, we must also look at the musical rhythm and the function it plays in coordinating all of semiotic modes together.

Figure 7.

Rhythm and phrasing of bodily actions at higher levels of analysis.

Musical rhythm

The musical rhythm has arguably the most important function in coordinating and reinforcing all of the rhythms we have examined thus far. The music has five distinct sections, with a transitional section between Sections 4 and 5, characterised by a musical “breakdown” and “drop”. Sections 1–4 have no vocal line (instead, the Gillard speech is overlaid), only percussion and bass, and in Section 5, the vocal line of the music enters just after the Gillard speech excerpt finishes. Each section has a unique combination of instruments playing and a unique variation of the percussion/bass riff. Analysing the rhythms of the percussion riffs of each section, the main type of rhythmic group that constantly repeats is an iamb, which is a rhythmic group that consists of an unaccented beat followed by an accented beat. Iambs can be found at the lowest level in the form of two semiquavers followed by a quaver (see Figure 8(a)). The two semiquavers are heard as an anacrusis (aka pick-up or upbeat) that leads to the quaver. This rhythmic motif is found consistently throughout the music in the percussion and also in the vocal part when it enters in Section 5. At a higher level, iambs are created by accents on beats two and four of the bar (see Figure 8(b)).

Figure 8.

Analysis of musical rhythm at lowest hierarchic levels.

The rhythm and phrasing of the music can be analysed at even higher levels, when examining the sections side by side. At level I of this analysis, Sections 1 and 2 form one rhythmic group, and Sections 3 and 4 form another. These groupings can be justified based on the musical similarities and differences between sections – there is a clearer musical disjuncture between Sections 2 and 3 than between sections 1 and 2 or between Sections 3 and 4. Sections 2 and 4 are more “complete” versions (e.g., additional instruments, gaps in the bass line are filled in, louder dynamics) of Sections 1 and 3 respectively, and can therefore be considered to be the prominent sections of their respective rhythmic groupings at level I. In other words, Section 1 builds up to Section 2, and Section 3 builds up to Section 4. In the case of the latter, this build-up in reinforced by a musical “uplifter” created by filtering techniques.

The breakdown and drop between Sections 4 and 5 create an even greater disjuncture than between any other sections, therefore at level III, Sections 1–4 have been analysed as a single rhythmic group preceding Section 5. Since the drop and the section following it mark the musical climax – the point at which all musical elements are present and the intensity is at its peak (cf. Butler, 2006: 273) – Section 5 is the prominent moment and Sections 1–4 make up the non-prominent rhythmic group at level III. Again, this rhythmic structure creates a sense of preparation for, and leads towards the final, prominent moment. Moving back down to level II, the first rhythmic grouping formed at level I (Sections 1 and 2) is the non-prominent rhythmic group, and the second rhythmic grouping (Sections 3 and 4) is the prominent rhythmic group, since the latter has complex rhythms, louder dynamics, and more instrumental parts, most notably the bass line. At all levels of the musical rhythmic analysis (including the lower levels), rhythmic groups all end with the moment of prominence. The recurring rhythmic pattern across hierarchic levels reinforces the sense of build up towards the drop and final section of music (Figure 9).

Figure 9.

Analysis of musical rhythm at higher hierarchic levels.

When the musical rhythmic structure is overlaid on top of the sequence of actions (Figure 10), it becomes apparent how the music helps to create a sense of build-up towards the glambot and to mark it as the prominent moment of the video. In addition to the music, the glambot is also marked for prominence visually through video editing effects such as slow motion and lens flare effects. By reinforcing a sense of build-up during the actions that precede the glambot and by marking the glambot itself as the main event, the sequence of actions preceding the glambot can be interpreted as a preparatory phase that builds anticipation towards the glambot.

The musical rhythmic structure also plays an important role in reinforcing and coordinating with the speech rhythm. Each speech Move roughly lines up with a corresponding musical Section (from 1 to 4, see Figure 11). Thus, the musical rhythmic structure reinforces the build-up towards the main phrase of the entire speech excerpt at the very end of Move 4 ([he/ needs / a / mi rror//]]). The build-up makes the main phrase a sort of climax for the speech excerpt. This speech climax is reached just before the drop, which is effectively the musical climax. In fact, the speech climax is timed so that it coincides with the “breakdown”, the brief moment in which the music has faded just before the drop (see Figure 12. See also Appendix). The breakdown not only functions in a “traditional” way to build anticipation for the drop that immediately follows (Butler, 2006: 4), but in this instance, it also allows the music to “step aside” momentarily to give maximum impact to the speech climax, just before the music delivers its own impactful climax, which as noted above, functions to give prominence to the glambot. An analogous comparison could be made to the example quoted above in 4.2 from Bartenieff and Lewis (1980: 73) of the coordinated rhythm of the men felling a tree with an axe – the contribution of each participant is maximised by striking at the right moment i.e., whilst the other is in its recuperative phase.

Figure 10.

Rhythmic structure of music overlaid on structure of bodily actions.

This section analysed the rhythms of the speech, bodily actions, and music of the TikTok video. Each of these rhythms have similar structural features and are coordinated with each other. A pattern of end-accented rhythmic groups (i.e., rhythmic groups which end with the prominent moment) was found at all levels of hierarchic organisation across all three rhythms, and was most evident in the musical rhythm. This rhythmic pattern helps to construct an extended preparatory phase with a strong sense of goal-oriented movement followed by a relatively short yet impactful climax (cf. Cooper and Meyer, 1960: 129–134). The analysis demonstrated that the rhythm of bodily actions is intertwined with the speech rhythm since many of the accented actions also function to reinforce speech accents. The analysis also demonstrated that the musical rhythm plays an important role in coordinating all of the rhythms and semiotic modes: it not only reinforces the sense of build-up and climax in the other rhythms, but it also coordinates the speech excerpt climax and the glambot climax so that they “strike” one after the other. This maximises the impact of both climaxes, and ultimately makes the glambot the final, most prominent moment.

Figure 11.

Rhythmic structure of music overlaid on move structure of speech.

Figure 12.

Excerpt of spectrogram of audio with screenshots that correspond to time points marked by broken lines. Descriptions of speech and bodily movement are provided with each screenshot (see also Figure 6, 7 and 10). Bar numbers that correspond to musical score (see Appendix) are marked at the bottom of the spectrogram. The musical climax (the drop) begins at the anacrusis to bar 21. The speech climax is in the second half of bar 20 (which also coincides with screenshot 3), at which point the music has momentarily faded out.

Intertextuality and rhythm

In this paper, we have conceptualised the semiotic resources that are used in the TikTok video under examination as the semiotic modes (speech, bodily action/gesture, music) and the intertexts (already existing texts and generic structures of text types) that are used to make meaning. Intertextual resources used in this video include an excerpt from Julia Gillard’s misogyny speech, the generic structure of the “glambot/boss challenge”, and the song Boss Bitch, which is included in the generic structure of the challenge. By tracing the socio-cultural histories of these intertextual resources, the meaning potentials of these resources become apparent. The Gillard misogyny speech is a culturally iconic speech that is referenced often in a variety of semiotic modes to share feminist values and to enact an identity that condemns misogynist and sexist men, particularly those in positions of power. The glambot challenge is in part about self-care and self-empowerment but can also be used to display the TikTok user’s ingenuity and self-confidence.

Although the TikTok video uses pre-existing texts and text structures to reproduce meaning, it is not a mere exact rearticulation of these texts. The intertextual resources are fragmented, transformed and assembled to form a new text, and is a new production of meaning (cf. Hodge and Kress, 1988: 251; Kress, 1993: 176). In a sense, the TikTok video is a kind of multimodal collage – an assemblage of text fragments from various sources, articulated multimodally. To understand the meanings produced in this video, it is not enough to simply identify the intertextual resources. It is also crucial to examine how they are transformed and assembled, or in other words how they have been ‘resemioticised’ (Iedema, 2003). In particular, the multimodal analysis of rhythm undertaken in this paper has been a productive method for examining this assemblage of semiotic resources.

The speech excerpt used in the video has been semiotically transformed in many ways from the original text. The excerpt is a small fragment of an entire speech that lasts 15 min. The excerpt thus has a beginning and an ending distinct from the original speech. The point of departure of the speech is a declaration that negates Abbott’s accusations of sexism and misogyny (I will not be lectured…) and the point of arrival is a short, sharp reversal of those accusations (he needs a mirror). The speech excerpt is slightly shorter than the musical excerpt, and both are edited to coordinate with each other. The playback speeds of both the music and speech audio are subtly tweaked to help align speech moves to corresponding musical sections. The musical “breakdown” is edited so that the music fully fades out, and a brief pause in the music track so that the drop is timed just after the speech climax. The vocal track is edited out of the music to foreground the speech audio, and the vocal track is reintroduced just after the speech excerpt ends. The speech excerpt only uses the audio from the original misogyny speech, not the video recording of Gillard’s delivering it. Instead, the TikTok user records her own gestures and bodily actions as she lip-syncs the speech, and adds visual effects to the final part of the recording.

The video recording of the user’s actions, the visual effects and the background music are all elements of glambot challenge structure. However, there are some differences between the structure of this iteration of the glambot challenge and the generic glambot challenge structure outlined in Figure 3. The video fully omits the first stage of the generic structure (original glambot footage). As for the second stage of the generic structure, instead of the creative process of setting up a makeshift moving camera, this stage (which is the first stage of the video) entails the user’s process of putting on makeup and accessories. Despite this obvious difference, it nevertheless serves a similar function to the original i.e., construing the preparatory and creative process that precedes the main event, the actual glambot. The final and main stage, the actual glambot, does not deviate from the prototypical generic structure. This stage is much shorter than the preceding stage just as in other glambot challenge videos. Although the video uses the same background music (Boss Bitch) as other glambot challenge videos, this too has been edited in particular ways as detailed above.

The multimodal rhythmic analysis demonstrated how the speech, music and bodily action rhythms are coordinated and intertwined with one another. Accents in the rhythm of the bodily actions constantly reinforce accents in the speech rhythm, thus allowing the viewer’s attention to move back and forth between the meanings made in the speech excerpt and those made in the glambot preparation. Although the speech excerpt ends before the end of the video, the music is a constant from start to finish and its rhythm has an important role in providing an overall structure to the video. A notable feature of the rhythms of speech, music and bodily action is the prevalence of end-accented rhythmic groups. As discussed above, these end-accented rhythmic groups create a sense of build-up or preparation that leads to the main event. These are especially prevalent in the music, which features end-accented rhythmic groups at all hierarchic levels of analysis, thus reinforcing this type of rhythmic grouping in the speech and bodily action rhythms. The music thus plays an important role in creating anticipation in extended preparatory phases that lead to the main event (cf. Cooper and Meyer, 1960) in both the speech and glambot. For the speech, the main event is the final move, he needs a mirror, and the preparation is the rest of the speech excerpt, in which Julia Gillard negates claims of sexism and misogyny from Abbott and prepares to flip them back on him. For the glambot, the main event is the glambot itself and the preparation is the process of putting on makeup and dressing up for the glambot. As discussed in the section, Musical rhythm, the musical rhythm allows maximal impact to both main events as they are timed one after the other. The ordering of these two main events is also significant. The speech climax comes first and marks the end of the speech excerpt, leaving the glambot as the final moment and main event of the entire video.

By integrating and coordinating the various semiotic resources together, the multimodal rhythm plays an important role in coordinating and articulating the complex set of meanings in this video. There are two main sets of meanings made in the video: meanings associated with calling out misogyny and sexism, which are primarily made by the speech excerpt, and meanings associated with flaunting self-confidence, femininity and power, which are primarily associated with the glambot. The intertwining of the rhythms of speech and bodily actions allows these two sets of meanings to be articulated together, resulting in the performance of a “girlboss” identity revolving around (neo)liberal feminist values. This girlboss identity, often deployed ironically on TikTok, has been critiqued as part of “a collective feminist politics that lacks positive identity markers; an identity predominantly based on disidentification with imagined mainstream others” (Chen and Zeng, 2022: 2). It has also been criticised for adopting an aesthetic grounded in capitalist notions of the “enterprising self in combination with themes from different waves of feminism” (Alexandersson and Kalonaityte, 2021). Other aspects of the video further contribute to the construction of this identity, including the song lyrics (“I’m a bitch, I’mma boss, I’mma shine like gloss”) and that the identity of the speaker, Julia Gillard who at the time of giving the speech was the Prime Minister of Australia, is an exemplary woman in a position of power, and is therefore a “bonding icon” (Stenglin, 2004) in this context. The ordering of the respective main events of the two sets of meanings (i.e., the speech climax and the actual glambot) is also significant. Once the video reaches the speech climax – the moment at which Gillard delivers her attack on Abbott – the speech excerpt ends and so do the meanings associated with calling out misogynists. At this point, the misogynists have been completely “dealt with”, and the video then proceeds to the glambot, in which misogynists are no more and the focus is solely on the celebration of the user’s glamour and power.

Conclusion

Our study contributes to the understanding of the role of rhythm in multimodal communication and the construction of meaning in digital media. Through our analysis, we show that the TikTok video is a multimodal collage of fragmented and transformed intertextual resources that form a new production of meaning woven together through rhythm. This key role of rhythm observed through our analysis supports van Leeuwen’s assertions about the centrality of rhythm to multimodal meaning-making:

Neither language, nor action, nor music, is indispensable in the structuring of multimodal texts that unfold over time. What is indispensable is an element all three have in common, rhythm (van Leeuwen, 2005). Rhythm provides cohesion, segments the speech, or the action, or the music into communicative moves that propel the semiotic even forward. And rhythm is also the physical substratum, the sine qua non of all human action. Everything we do has to be rhythmical and in all our interactions we synchronize with others as finely as musical instruments in an orchestra. Without rhythm we fall over and trip each other up. (Van Leeuwen, 2011: 169).

We have thus offered a detailed method for annotating rhythmic choices that can be used by analysts undertaking close analysis of how meanings are made in digital video.

The rhythmic analysis of this paper has been a very important aspect of social semiotic analysis. An affordance of TikTok is that videos are very short, so users must find ways of being economic with time in their meaning making. The rhythmic analysis demonstrated how a rich set of meanings have been carefully condensed and coordinated across multiple modes, using a variety of resources, into a very short video. This subsequently allows us to appreciate the level and type of multimodal literacy (involving basic audio-visual recording and editing skills) that was once a specialised competency now becoming more generalised among users of multimedia social media platforms, in particular, among “content creators”. A notable feature of the rhythmic structure at the highest level of organisation is an extensive preparatory phase that leads to a short yet impactful main event. This structure devotes a large proportion of the video on the creative process that leads to the main event, and little time to the main event itself. It is interesting to note that both for this particular TikTok video as well as the generic structure of the glambot challenge, the creative process behind the main event is a notable feature. This is perhaps also a reflection of the rise of “content creators” – the showcasing of creative skills and the semiotic construction of a content creator identity have become as important meanings to be made in social media content as the main content itself. Future work might consider how this creative role, together with the multimodal affordances of platforms such as TikTok, are changing the ways in which new kinds of feminism are being articulated in relation to highly valued texts such as Gillard's misogyny speech.

Footnotes

Declaration of conflicting interests

Author Michele Zappavigna is a member of the Editorial Advisory Board of Multimodality & Society. The author did not take part in the peer review or decision-making process for this submission and has no further conflicts to declare.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Correction (January 2026):

Author disclosure statements have been updated.

ORCID iD

Joshua Han

Notes

Appendix

Figure 13.

Musical score with rhythmic annotation for the ‘Misogyny speech #glambot version’.

Author biographies

Joshua Han is Junior Reseacher in the School of Music, Theatre and Art at Örebro University undertaking research in multimodality, social semiotics and music. Joshua completed his doctoral thesis on the social semiotics of music and movement at the University of New South Wales, Australia.

Michele Zappavigna is Associate Professor in the School of Arts and Media at the University of New South Wales and a co-editor of the journal Visual Communication. Key books include Searchable Talk: Hashtags and Social Media Metadiscourse (2018, Bloomsbury) and Discourse of Twitter and Social Media (2012, Bloomsbury). Recent co-authored books include Researching the Language of Social Media (2014; 2022, Routledge) and Modelling Paralanguage Using Systemic Functional Semiotics (2021, Bloomsbury). Forthcoming is Emoji and Social Media Paralanguages (Cambridge University Press) and Innovations and Challenges in Social Media Discourse Analysis (Routledge).

References

Alexandersson

Kalonaityte

(2021) Girl bosses, punk poodles, and pink smoothies: girlhood as enterprising femininity. Gender, Work and Organization 28(1): 416–438.

Bartenieff

Lewis

(1980) Body Movement: Coping with the Environment. London: Psychology Press.

Butler

(2006) Unlocking the Groove: Rhythm, Meter, and Musical Design in Electronic Dance Music. Bloomington, IN: Indiana University Press.

Chen

Zeng

(2022) ‘Gaslight, gatekeep, girlboss’: memefied femininities and disidentification in tiktok youth cultures. AoIR Selected Papers of Internet Research. https://spir.aoir.org/ojs/index.php/spir/article/view/12988/10867

Cooper

Meyer

(1960) The Rhythmic Structure of Music. Chicago, IL: University of Chicago press.

Hodge

Kress

(1988) Social Semiotics. Cambridge: Polity Press.

Iedema

(2003) Multimodality, resemiotization: extending the analysis of discourse as multi-semiotic practice. Visual Communication 2(1): 29–57.

Kaye

DBV

Zeng

Wikstrom

(2022) TikTok: Creativity and Culture in Short Video. Cambridge: Polity Press.

Keller

(1973) Phrasing and Articulation. New York, NY/London: W. W. Norton & Company.

10.

Kress

(1993) Against arbitrariness: the social production of the sign as a foundational issue in critical discourse analysis. Discourse & Society 4(2): 169–191.

11.

Shen

(2023) Unpacking multimodal fact-checking: features and engagement of fact-checking videos on Chinese TikTok (Douyin). Social Media + Society 9(1): 20563051221150406.

12.

Martin

Rose

(2008) Genre Relations: Mapping Culture. London: Equinox.

13.

O’Toole

(2023) Collaborative creativity in TikTok music duets Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. Hamburg: Association for Computing Machinery. Article 791.

14.

Ramati

Abeliovich

(2022) Use this sound: networked ventriloquism on Yiddish TikTok. New Media & Society 14614448221135159.

15.

Sheets-Johnstone

(1979) The Phenomenology of Dance. London: Dance Books ltd.

16.

Siles

Valerio-Alfaro

Meléndez-Moran

(2022) Learning to like TikTok … and not: algorithm awareness as process. New Media & Society 14614448221138973.

17.

Stenglin

(2004) Packaging Curiosities: Towards a Grammar of 3D Space. (Doctoral Thesis) University of Sydney.

18.

van Leeuwen

(1992) Rhythm and social context: accent and juncture in the speech of professional radio announcers. In: Tench

(ed) Studies in Systemic Phonology. London: Bloomsbury, 231–262.

19.

van Leeuwen

(2005) Introducing Social Semiotics. London: Psychology Press.

20.

van Leeuwen

(2011) Rhythm and multimodal semiosis. In: Dreyfus

Stenglin

Hood

(eds) Semiotic Margins: Meanings in Multimodalities. London: Bloomsbury, 168–176.

21.

Vizcaíno-Verdú

Abidin

(2022) Music challenge memes on TikTok: understanding in-group storytelling videos. International Journal of Communication 16: 26.

Multimodal rhythm in TikTok videos: Exploring a recontextualization of the Gillard ‘misogyny speech’

Abstract

Keywords

Introduction: Multimodal rhythm and TikTok

Background

Australian prime minister Gillard’s ‘misogyny speech’

The glambot challenge

A method for analysing multimodal rhythm

Rhythmic analysis

Speech rhythm

Rhythm of bodily actions

Musical rhythm

Intertextuality and rhythm

Conclusion

Footnotes

Declaration of conflicting interests

Funding

Correction (January 2026):

ORCID iD

Notes

Appendix

Author biographies

References