Abstract
Recipe videos are among the most viral genres of videos on social media. Yet, little research has been done on their aesthetic and formal attributes, especially on how they operate within the frameworks of the attention economy and embodied interaction specific to social media interfaces. This paper examines recipe videos published on Tasty, one of the most popular Facebook pages in the world. We analyze these videos through a three-dimensional model that integrates their semiotic characteristics (visual, auditory, and textual), their interactive and haptic qualities, and their invitation to perceptual engagement and sensorimotor response. We conclude that Facebook recipe videos are exemplary of a broader category of social media videos which we call hyper-sensory videos: these create heightened multisensory experiences that take precedence over informational use or narrative involvement. Hyper-sensory videos present a cultural response to broader questions regarding materiality, presence, and embodied relations within a highly mediated social reality.
Keywords
‘I was supposed to be writing. Instead, I found myself in a waking Facebook dream of cheesy French pull-apart bread, and whiskey-iced tea, ice cream donut holes, and pulled pork porchetta sandwiches. Easy-bake artichokes zipped past; honey BBQ chicken wings and strawberry cotton candy cocktails […] I watched clip after clip of anonymous hands crafting perfectly plated dishes. I wasn’t particularly hungry. I wasn’t particularly bored. And I definitely won't be making any of them myself. But none of that mattered. I was captivated---and so is the Internet’ (Greenberg, 2016).
Among other journalists, Greenberg writes about her impression of recipe videos on Facebook. Lasting less than a minute, the videos are mainly shot from an overhead angle which primarily captures the hands of the chef and the speedy preparation of colorful food. Popular on Facebook, but also widespread on other social media platforms, the videos have been described as mesmerizing, compelling, and satisfying (Evans, 2016; Greenberg, 2016). The compulsive character of such responses may have made recipe videos among Facebook’s most popular genres. In December 2022, Tasty – the most prominent recipe video Facebook page – had garnered more than 106 million followers (Tasty, n.d.).
This paper examines recipe videos as a significant indicator of sensory, affective and cultural economies linking audio-visual content, device and platform affordances, and embodied user practices and experience. Such videos are not merely cooking tutorials, even though they present detailed recipes. Rather, they utilize the Facebook interface to capture users' attention in a frantic, constantly changing environment, overcoming obstacles like mute auto-play or users' habitual reading and swiping routines to offer overtly sensory encounters. Furthermore, these videos can be productively located in relation to other genres of social media video, such as ASMR videos, DIY videos, slime videos, and others. Together these genres, which we call hyper-sensory videos, utilize audio-visual elements to give precedence to heightened sensation over narrative, informational, or emotional aspects, granting the videos a synaesthetic potential which can appeal to senses like taste and touch.
Despite their surprising popularity – and extensive media coverage and imitations of Tasty videos by ordinary users and commercial competitors – there is no research literature on Facebook recipe videos. This paper thus contributes to a growing academic interest in sensory videos on social media (Gallagher, 2019; Maddox, 2021; Nansen and Balanzategui, 2022). It is also informed by the insistence of ‘new materialism’ on the materiality of digital media (Dourish, 2017), by sensory studies (Howes and Classen, 2014), and by theorizations of an intensified desire for presence in contemporary culture (Gumbrecht, 2014). Examining Facebook recipe videos opens new paths for thinking about the emergence of cultural forms on social media, contexts of embodied reception and encounter, and platform and cross-platform affordances.
We begin by discussing Facebook recipe videos as a visual form, along with the conditions of attention, interruption, and interaction that are pertinent to the social media context in which they are encountered. We then apply a three-dimensional analytical model which integrates the videos’ multimodal semiotic characteristics along with their adaptation to the social media interface and the embodied, gestural, and sensorimotor relations between users and screen devices. Using the model to analyze sensory videos on Facebook contributes to understanding the physical character of our connections to technology, and new practices of embodying presence in social media usage.
Video on Social Media
Web videos and social media videos have been discussed in the research literature, yet there is no commonly agreed definition of them as a genre, a format, or a medium. Eder (2018) argues that online videos (not only on social media) constitute a medium with unique characteristics: low production costs enable a variety of non-professional, user-generated content, distribution is networked and algorithmic, and ‘reception is characterized by usually small, often mobile screens, by higher degrees of distraction, but most of all by new possibilities for interaction and participation’ (2018:184). Burgess and Green (2018) observe that while approximately half of YouTube videos are user-generated and created especially for the platform, a considerable number are ‘quotes’ – videos originally produced for traditional media and uploaded afterward to the platform. Facebook recipe videos are, in the main, neither user-generated nor ‘quotes’: they are professionally produced for circulation on Facebook.
The architectures and user interfaces of distinctive platforms produce different contexts of engagement with video. Videos are subjected to algorithmic processes of selection and personalization, which eventually determine which videos are made visible to whom. No less significantly, watching one video clip tends to insert viewers into a ‘clip chain’ of viewing further videos, a process frequently facilitated through ‘auto-play’ functions (Lovink, 2008: 12). In contrast to YouTube, where users can easily search for specific videos, and videos are displayed in the center of the screen, on Facebook, videos are displayed in a frenetic and highly competitive temporal environment: the Facebook newsfeed, a continually flowing ‘social media data-stream’ (Hochman, 2014). The data stream is composed of posts, announcements, images, chats, and other messages that are ‘refreshed’ in real-time. Items appear in the stream and aggregate there in the form of a constantly updating and moving stack, across which the user can scroll backwards and forwards (Hochman, 2014). The feed is produced as a continuous flow, in which, even when content is not made especially for users, it is placed there by the platform – seemingly in users’ best interests – waiting to be discovered by them (Lupinacci, 2021). Of course, food has long been salient content of this ‘data stream’; the web is filled with ‘food porn’, recipe videos, cooking tutorials, mukbang videos, and more. As iPads enter the kitchen and recipe videos spread to living rooms and bedrooms, Tania Lewis (2020) suggests digital food moves us beyond narrative to embodied practices.
The great popularity of recipe videos in this challenging habitat requires a high degree of adaptation to Facebook’s platform affordances: as Simonsen and Krogager (2021) note, Tasty videos were originally created specifically for Facebook. Here we mean not just technical affordances, like the use of the like button or viewers’ comments, but also ‘higher-level affordances’ (Bucher and Helmond, 2018), which represent common values of the platform’s users and which shape their behavior. Given a situation in which the viewer’s immersion is difficult to achieve, recipe videos need to produce what Novak calls eversion, ‘a casting outward of the virtual into the space of everyday experience’ (2002:311), creating a sensory overflow from the screen. To catch the user’s eye in the frenetic Facebook feed, they have to engage in the form of dialog with the viewer, who is continually urged to click something, swipe, ‘like’, comment, or open a link. The videos need to be geared towards the bodily activation of the viewer, as well as to their perceptual, cognitive and emotional engagement.
Our claim, then, is that the dynamism of the Facebook feed as a ‘social media data stream’ (Hochman 2014) requires the development of distinctive aesthetic strategies to facilitate and guide user engagement. Employing these strategies enables recipe videos to thrive in a stimulus-crowded and activity-oriented visual environment operating in the networked ‘attention economy’ (Davenport and Beck, 2013; Goldhaber, 1997). We briefly elaborate on these claims in the next section.
Facebook Video and Continuous Partial Attention
In 2006, Thomas Friedman described a shift from the ‘age of information’ to ‘the age of interruption’ (Friedman, 2006). Despite its hyperbolic character, this statement is clearly pertinent to the Facebook newsfeed, where information constantly flows in large quantities. At the same time, additional stimuli are ever present via the digital device, as we have already outlined: personal chats and messages, adverts, alerts from other applications, etc. This creates an unremitting condition of ‘continuous partial attention’, in which the user is positioned as a perpetually connected ‘live node on the network’, always alert to the possibility of response (Stone, 2008). This condition is particularly acute in social media since they emphasize the experience of connectivity, acting as ‘the means by which, through constant “pings,” users assert their relevance as nodes in the network’ (Rose, 2010: 43).
The heightened interactivity of new media means that continuous partial attention is embodied as a physical and sensory state, performed in our postural and gestural relations to our devices and in the hand-eye coordination demanded by interaction with the screen. The interaction with the interface (screen, keyboard, mouse, or touch screen) requires the harnessing of bodily participation and a constant reorganization of attention (Crawford et al., 2019). More precisely, it is manifested as ‘operative attention’: micro-scale multi-sensory alertness to the interactive and operational possibilities of the user interface (Frosh, 2018). The design of the Facebook newsfeed inculcates a mode of continual reading-in-motion, in which ‘looking and doing occur at the same moment’ (Verhoeff, 2009:288), scanning while scrolling, whereby one hand or finger always rests on the mouse, the trackpad, or the touchscreen. Hence, the length of time users spend watching a video on Facebook frequently depends on the time required to scroll down through the visible section of the feed on the screen.
Facebook video, like other digital formats, participates in the ‘attention economy’ (Goldhaber, 1997), whereby attention becomes an increasingly scarce and valuable resource precisely in an age of exponential information growth (Davenport and Beck, 2013). Since Facebook video competes primarily for viewer attention (a feature reinforced by visible metrics such as views, likes, and comments and the buttons that facilitate them), highly successful formats such as recipe videos employ aesthetic strategies designed to shape the user’s sensory experience (for instance, the relations between vision, gesture, and touch) and sensorimotor response. It is to these strategies that we now turn, beginning with a brief explanation of our case study.
Corpus Formation and Analytical Procedures
Using Facebook recipe videos as a research object enables us to engage with the consequences of a particular social media interface: this is especially pertinent since, as we have already noted, recipe videos are among the most viral genres on Facebook. Nevertheless, our analytical model also scrutinizes interface elements that may appear in other social networks and is applicable to them. Within this genre of recipe videos, our decision to focus, in particular, on the Tasty community is also guided by popularity: Tasty is one of the ten most popular Facebook pages globally (Statista, 2022). An additional reason for focusing on Tasty is that they are responsible for the production and consolidation of a fixed video format designed specifically to take advantage of the Facebook newsfeed.
The genre of recipe videos pre-exists Tasty, emerging on blogs, recipe websites, and even TV. ‘Hands-only’ directing style was introduced in the 60’s French TV magazine Dim Dam Dom, in which table manners were demonstrated by anonymous hands in an ‘avant-garde style’, emphasizing the refinement and elegance expected of the show’s audience (Danet, 2022). The early online version of recipe videos included shots of hands but also face shots and long verbal explanations. In December 2013, Facebook introduced the mute auto-play feature as the default viewing mode for videos on the platform, but only in November 2014 was the first recipe video created, which replaced verbal explanations with textual captions and background music. 1 This video was three minutes long. In contrast, on its establishment in 2015, Tasty imposed a strict time limit of less than one minute per video, in addition to a fixed style formula which we detail below. Within a year, Tasty had created a family of niche Facebook pages, such as Proper Tasty for English recipes, Tasty Demais for South American cuisine, Goodful for healthy cooking, and Tasty Junior for children (Griffith, 2016). These pages varied the content of the videos displayed but retained and reinforced the use of a single style format across all pages.
In addition, competitor recipe video pages such as Cooking Panda and Tastemade adopted Tasty’s formula. This formula has even survived threats to Tasty’s business model: on August 2017, when Facebook introduced in-stream video ads, which would only appear in videos of 90 seconds’ length or more (Marshall, 2017), Tasty ‘lengthened’ its videos simply by stitching existing videos together in a sequence, at least initially (longer videos have since appeared, though the stitched films are still prominent). Tasty’s initial visual formula also dictates Tasty videos on other platforms, such as YouTube, Instagram, and TikTok (although Tasty videos are even shorter and faster on TikTok, the resemblance to the original formula is apparent).
Since sharing is a key indicator both of viewers’ active engagement on social media and of the virality of a particular text (Wittel, 2011), our research corpus consists of the 15 most shared videos from the launch of the Tasty Facebook page until the beginning of the research project (July 2015–March 2017). 2 Videos were chosen according to data extracted using the Netvizz Application (Rieder, 2013). The corpus was kept small primarily to enable in-depth formal and interpretive examination of the videos but also because the stability of the format was almost immediately apparent. At the time of study, these 15 videos had been shared between 2,710,928 and 5,187,577 times. They included a range of recipes, from main courses to desserts, pastries, meat dishes and vegetables, healthy food, as well as sugar-rich recipes. We could not detect any clear connection between the type of recipe or dish and the popularity of the video.
The procedures for analyzing these videos arose from our view that traditional semiotic analyses, while important, need to be supplemented by an investigation of the embodied and sensory aspects of the encounter with user and platform interfaces. Traditional semiotic analysis is concerned with the signifying attributes and potential meanings of visual texts (Kress and Van Leeuwen, 1996); however, we add an embodied dimension to the analysis, expanding beyond semantic and interpretive aspects to sensory parameters. As Treske notes: ‘What is becoming relevant is that online video is touching, or even merging with, object spaces in various forms and practices’ (2015: 46). This decision builds on theoretical insights regarding the phenomenological centrality of embodied experience and activity in human-computer interaction (Dourish 2004), and in work on mobile interfaces (Farman, 2012). We therefore developed a three-dimensional framework that retained the specificity of video analysis while being attuned to the conditions of display and engagement in the Facebook interface. This framework addressed 1) the videos’ visual, auditory, and textual semiotic attributes, 2) the involvement and activation structures of the videos as elements of the Facebook interface, and 3) the embodied, sensory, and sensorimotor relations made possible between users and screen devices. All videos in the corpus were subjected to this systematic analysis (the illustrative figures shown in later sections are exemplars of characteristics recurring across the corpus).
The first procedure consisted of creating written transcripts describing the visual techniques used in the videos, roughly based on the schema proposed by Diana Rose (2000). Each shot was analyzed according to Rose’s categories: visual content, shot type, frame size, and point of view. In addition, we recorded editing styles and effects and elements such as captions and sound. Given the absence of schemas for analyzing the potentialities for viewer involvement in videos on social media, for the second procedure, we adapted categories of vividness and interactivity originally proposed by Steuer for examining telepresence (1992) and further developed by other scholars (Fortin & Dholakia; 2005; Sohn, 2011; Kang, O’Brien, Villarreal, Lee & Mahood, 2019). Vividness refers to the ability of technology to produce a sensuously rich mediated environment, while interactivity refers to the degree to which users of a medium can influence the form or content of the mediated environment. The third analytical procedure focuses on the embodied relations between screen devices and users. It addresses issues such as the physical proximity and pose characteristic of screen use in particular contexts and the significance of how devices are held. It also considers other physical reactions, such as the potential stimulation of the taste buds, or the skin, even though the stimulation is the effect of synesthesia (consolidation and interaction between the senses) and not of real, tangible contact with those organs.
Visual, Textual, and Auditory Attributes of the Videos
Our analysis of the visual, textual, and auditory attributes of the videos revealed a twofold process of remediation, the representation of one medium in another (Bolter and Grusin, 1999). First is the transformation of the recipe from what was previously primarily a textual genre (including both writing and still images) into a video, a process mainly but not exclusively signaled through the use of verbal captions in the video. Second is the adaptation of the video for the stimulus-crowded milieu of the Facebook interface and the interactive digital screen. Hence, the editing is attuned to the frantic environment of the feed, and thanks to explicit time-compression techniques such as fast-forward (which was used in every video), the videos can depict the full preparation and baking of something as complicated as a layered cake in only 50 seconds. Additionally, the formal structure of the videos more closely resembles those of narrative cinema, with plotting, moments of tension, and climax, than the more abstract schemas of traditionally written recipes. Each video opens with a shot of an empty workspace; this is followed by the appearance of hands placing implements and ingredients in the workspace and performing a variety of actions such as mixing, cutting, and cooking. In every video, this sequence culminates in a shot of a spoon or fork entering the space of the frame, digging into the prepared food, and conveying a (presumably tasty) morsel to an off-screen mouth (Figure 1). Typical final shots of a Tasty video (Tasty, 2017a).
Three techniques were particularly pertinent to the remediation of video for Facebook: camera angle, image composition, and sound and captions.
Camera angle: In contrast to contemporary television cooking programs, the Tasty videos are filmed from above. More specifically, television programs prominently feature the face of the chef, who is also presented by name, acts as the program’s central character, and whose facial expressions and verbal utterances help mediate the taste and smell of the food to the audience (Ketchum, 2005) Furthermore, celebrity chefs also connect the audience with certain values and sociability (Lewis, 2010). In Tasty videos, the film is shot from the point of view of the chef, who is thereby made invisible to the viewer, with the crucial exception of her or his hands. Indeed, the hands in the video are anonymous, even generic: they do not make a connection to a specific individual (though they do imply membership of social groups through judgments of skin color and gender). The depiction of hands alone moves the viewer beyond the narrative to a focus on embodied practices (Lewis, 2018). Produced by the overhead camera angle, this view of the chef’s hands creates possibilities for viewer personalization and even incarnation since viewers can sense that these hands could be theirs, that they are continuous with their own flesh (in-carnate), as they are being viewed from precisely the angle one sees one’s own hands (Figure 2). This magnifies the potential for an active, embodied connection with the viewer, inviting a sense of ‘I can also do this’. Interestingly, this adoption of point-of-view images for recipe texts is not new: it characterizes Japanese cookbooks (Martinec, 2003) with similar possibilities for viewer involvement: ‘Practically all the Japanese recipe photographs are taken over the cook’s shoulder […] and so the viewer’s point of view is identified with the cook’s point of view. This is the greatest degree of engagement—the viewer is made to identify with the represented cook’. (Martinec, 2003: 50). Hands in a Tasty video (Tasty, 2016a).
Composition: The overhead point-of-view camera angle intersects with key compositional characteristics of the Tasty videos – their perspectival flatness and the prominence of neatness and clean layouts. Perspectival flatness, or the construction of a two-dimensional viewpoint, is a key convention of data visualization (Kennedy et al., 2016). Two-dimensional viewpoints invoke a ‘God-like’ view (Kress and Van Leeuwen, 1996) in which no objects are obscured from sight and where all is arranged to the benefit of a masterful ‘view from nowhere’ – which actually disguises a powerfully constructed view from somewhere. The two-dimensional viewpoint is predominant in all the videos in our corpus (interspersed with other shots), potentially reinforcing the mastery of viewers over this perfectly ordered specular space precisely as they are encouraged to identify with the hands operating within it.
Another feature noted by Kennedy et al. is also apparent: the prominence of clean layouts. In data visualizations, the neatness of lines and cleanness of surfaces is associated with the supposed objectivity and truthfulness of diagrams. Such diagrammatic neatness is a core formal framework of the videos. The ingredients never spill from the pan onto the table, never fall off the work surface, and never even dirty the cook’s hands, unless handling them is part of the recipe. The videos thus represent a fantasy of culinary cleanliness and simplicity – a fantasy which echoes the tendency in online food videos in general to present food as flawless (Goodman and Jaworska, 2020). This sterilized space of action is given even deeper order by the prominence of simple geometric forms within the video frame. The plates, pans, and baking dishes become – in the flattened view from above – squares, rectangles, and circles (Figure 3). Gombrich (1979), also cited by Kennedy et al., stresses the connection between the desire for order and simple geometric patterns, arguing that the latter is more quickly and efficiently perceived in contexts of high levels of stimulus. Simple geometric shapes in Tasty videos (Tasty, 2015; Tasty, 2017b).
Hence, the composition of the video answers two requirements simultaneously. The first enables the video to be perceived quickly and easily in the distractive and stimulus-rich context of the Facebook feed. The second fulfills a more anthropological desire for order in the very field – food, cooking – in which the messiness of action in physical dimensions is transformed into the symbolic realm of culture (Douglas 1966).
Sound and captions: Sound in the Tasty videos is probably the most marginal signifying element. The soundtrack of the videos consists entirely of background music which is light and undramatic in tone. It is not accompanied by verbal explanations. This is hardly surprising given the introduction of Facebook’s mute auto-play feature in 2013: the videos are designed to work within the constraints of a default play mode that is silent. As a consequence, the videos make prominent use of captions, enabling viewers to watch and understand the films without sound.
In addition to its functional necessity, the use of captions has two remediating dimensions. It is a remediation of the short instructions and descriptions of quantities familiar from written recipes, but it also creates an additional resonance when combined with the use of images and a background soundtrack: silent movies. Captions – or intertitles – appeared in early cinema (in 1907–1908) when films were becoming longer and their narratives more complex, requiring additional non-pictorial explanation. They also triggered the firing of live ‘lecturers’ previously hired to provide commentary on certain movies (La Tour, 2005). Similarly, the captions in the Tasty videos enable them to bypass the necessity for aural commentary (to ‘fire the lecturer’, so to speak), which would require users to intervene via the interface by activating the (automatically muted) sound, providing a non-pictorial but visual avenue through which to make sense of the videos.
3
Two final attributes of the captions are worth noting: the letters are usually white, and the font is sans serif. These attributes graphically reinforce the themes of clean layouts and neat lines noted earlier (Figure 4). Captions in a Tasty video (Tasty, 2016c).
Interactivity
The interactivity of a medium is often measured by its technical characteristics and its degree of responsiveness. However, interactivity is also evaluated according to the perceptions of users: a medium is interactive if perceived as such (Sohn, 2011). Indeed, ‘perceived inter-activity’ is described as an attribute of social media platforms in general (Carr and Hayes, 2015), and not merely because they facilitate interactions between people. Significantly, the users’ sense that they can, if they want, influence the medium or its content is sufficient to create a perception of interactivity, even if users do not exercise that influence.
Taken at face value, the range of possibilities for user interaction and influence on Facebook videos is relatively narrow. The first interface of the Facebook video resembles the set of controls for traditional video players, including buttons for controlling volume and play/pause, with the addition of the familiar ‘scroll bar’ that shows the position of the current frame in the video (usually accompanied by a small preview window) and allows the user to move backwards and forwards within it. The second interface governing interaction with the video is Facebook’s broader ‘social interface’ that accompanies all Facebook posts and is not restricted to video (e.g. the like, share, and comment buttons and features).
These two interfaces distinguish between different modes of interactivity and user influence: the first over the viewing process itself, the second over the social uses of the video as a single entity. The first treats the video as a processual experience unfolding in time, and the second as a quasi-object to be operated upon (shared, liked, commented on, etc.). The intersection between these two is hierarchically structured to favor the visibility of the social interface over that of the video interface, since the former is consistently visible on the screen, while the latter, both on computers and on the Facebook mobile application, is hidden unless the viewer moves the cursor over the video or ‘touches’ the video via the touch screen. There are two reasons this hierarchy of interface visibility is important. First, Facebook is less interested in whether we watch whole videos than in whether we use video for social connectivity; hence, it makes sense to give priority to the social interface within a crowded visual field. Second, Tasty has already constructed its videos so that user control over playback is marginal to its perceived interactivity since it is designed primarily as an unfolding multi-sensory experience rather than as a functionally optimal cooking tutorial (where the need to freeze frames or rewind to earlier sections would be important for following a recipe). 4
Yet, neither the ‘video’ nor ‘social’ interface fully articulates the distinctive perception of interactivity characterizing the recipe videos. This is because the perception of their interactivity can be produced without activating any of these interfaces. Steuer (1992) emphasizes the significance of a medium’s ‘vividness’ – its sensory richness – as a component of perceived interactivity. Other research also connects vividness to perceived interactivity as well as to stronger positive attitudes to a medium over time (Coyle and Thorson, 2001; Fortin and Dholakia, 2005). Attributing greater vividness to one medium over another helps to create a heightened perception of interactivity for the more vivid medium. Hence, an animated or moving picture, or a continually reloading webpage, can appear as more ‘alive’ and interactive than a still image (Sohn, 2011; White, 2006).
Facebook video is exemplary of this condition. Much, if not most, of the newsfeed through which users continually scroll is made up of text and still images. In contrast, thanks to auto-play, videos on Facebook already appear to be running as we scroll, creating a window of seemingly autonomous movement within the otherwise static elements of the feed. One writer for the technology website Techcrunch compared the result to ‘the moving photos in Harry Potter newspapers’ (Constine, 2017). Resonating with the concept of vividness, the autonomous movement of the video captures the user’s attention before the user has pressed any buttons, creating conditions for a deceleration in scrolling or for ceasing to scroll entirely – even if only for a millisecond. In the attention economy, this fraction of a second, and this micro-temporal hiatus in a habitual gesture, is extremely valuable for granting the video a sensory advantage over competing stimuli.
Even though this pause in scrolling is transient in duration, and its visible influence over the screen is restricted to a pause in the scrolling effect, it is enacted through a distinct physical response, a change in the gestural rhythms of habitual interaction between the user’s hand and the interface that has been instigated by the movement of the video. This occurs, as we have noted, under a new regimen of eye-hand-screen relations which is continually alert to potential operative possibilities of physical interaction with digital interfaces. However, what is particularly significant regarding Facebook recipe videos – and their relation to the Facebook interface – is that the interaction with the video encourages the cessation of physical action rather than its intensification.
Video Viewing as an Embodied Sensory Experience
Recipe videos seem designed to make viewers feel as if they can touch (and perhaps, even taste and smell) the food in the videos. The most prominent shot, showing two busy hands, is frequently interrupted by mouthwatering extreme close-ups of food. These shots heighten texture and movement within the frame: the runniness of sauces spilling over plates, the sizzling of oil in pans, the viscous consistency of leavening dough, and the elasticity of melting cheese (see Figure 5). Resembling Marks’ description of haptic visuality in experimental erotic video, recipe videos fill the small screen with exaggerated movement and the unclear contours and dimensions of objects in extreme proximity, inviting a sensory exploration of the image as viewers try to figure out what they see. In this way, recipe videos appeal to the eye as an ‘organ of touch’ (Marks, 1998: 333), promoting ‘haptic vision’, which Marks and others (e.g. Boothroyd, 2009) have argued is less abstract and distant than ‘optical vision’, constructing ‘an intersubjective relationship between the beholder and the image’ (Marks, 1998:341). Extreme close-ups of movement and texture in Tasty videos (Tasty, 2016a, 2016b).
Marks’ concept of haptic visuality is, of course, a decidedly visual concept developed to account for particular features of avant-garde videos and in the context of feminist film theory. This primarily visual account of the haptic is relevant to recipe videos since the senses of touch and intimate contact they arouse are mediated through sight, and do not involve actual touch (Parisi, 2018) (as we observe below, this mediation may involve an instrumentalization of touch). Moreover, the experience of watching Tasty videos appears to mobilize synaesthetic potential. Synaesthesia is ‘ the union of the senses’ – a physiological condition in which certain perceptions trigger ‘unrelated sensations’ (Howes and Classen, 2014). Thus, while there is no actual interaction with the senses of taste or touch, the images make available an unexpected and seemingly inexplicable sensory pleasure (Nansen and Balanzategui, 2022) derived from the intensely detailed close-ups of food texture. The colorful imagery on the screen may arouse the sense of taste or the imagined feeling of the food’s texture from the inside of the mouth.
This synaesthetic potential of recipe videos is enhanced by their appropriateness to smartphone viewing through the positioning of the viewer’s body in relation to the space of the screen. Farman notes that space is constructed simultaneously with the sense of embodiment, while embodiment, in turn, is ‘practiced’ and habituated in relation to technologies (Farman, 2012). In the videos, such embodied practice simultaneously interweaves two distinct locations into a singular coextensive space: the filmed kitchen surface (e.g. chopping board) on the one hand, and the distance between the viewer and the screen, on the other. These are linked visually through the figure of the hands in the frame and physically through the hands of the viewer in relation to the device. Hence, the hands in the frame appear to emerge from the direction of the user and define relations of proximity with the screened content, such that users are positioned as though they are standing in front of a chopping board or cooking stove. Additionally, in the case of viewing on a smartphone screen, the hands in the video resemble extensions of the hands of the user holding the smartphone. The illusion of proximity created by this positioning situates the viewer at the close edge of the spatial zone that Hall (1969), in his theory of proxemics, calls ‘personal distance’. This zone is determined as being at ‘arm’s length’ and is ‘the limit of physical domination’ or control over objects or other people (1969:120). Thus, as close as the hands on the screen may feel, perception and observation of what might be happening on the work surface are still manageable.
Viewing recipe videos on touch screens involves actual physical actions, potentially enhancing the intimacy of the multi-sensory experience. The cradling of the smartphone in the palm of one’s hands, and the scrolling performed by stroking the screen surface, offer pleasurable sensations which merge with feelings of ‘fit’ (Cooley, 2004) and intimate acquaintance with the device itself, opening another channel – skin – for communication or stimulation (Parisi and Farman, 2018). On the touch screen, it is one’s physical hands which navigate and scroll, which single out items by pausing over them, which enlarge them, drag them, stretch them or highlight them. These navigational gestures require intuitive physical knowledge which creates a familiarity with the experience (Parisi, 2018; Richardson and Hjorth, 2017). Such activities can collapse the boundaries between action and perception, providing an integrated experience of making and viewing at the same time (Verhoeff, 2009). In the case of the recipe videos, when the user touches the video on the screen, her hands hover over the hands depicted, creating the sense of hand upon hand. 5 Hence, in addition to the positioning of the viewer through purely visual means (e.g. camera angle, visual distance in personal space), the very physical actions required by watching the video invite an intimate touch, fleshing out the sensory experience.
Finally, from a critical perspective, this integration of the senses can be seen as an instrumentalization of touch in the service of visually driven manipulation, an observation that echoes the concerns of recent work on touch interfaces about the dominance of vision in accounts of ‘the haptic’ that stem from film theory (see, for instance, Parisi 2018). Such manipulation can be read as constraining, rather than enabling and expanding, the sensory capacities of the viewer – substituting the user’s ‘machine-readable’ gestural movements on device and platform interfaces for actual physical tactility (the surfaces of touch screens are notoriously smooth). We do not wholly disagree with this observation but would add that this is precisely the point of visual-touch interfaces as simultaneously disciplinary and expressive sites for embodied interaction and experience (Frosh 2015). The videos can be understood to instrumentalize and impoverish touch because the habitual gestures of the user’s hands on the device are substituted for tactile encounters with the depicted objects. Yet, this also potentially realigns the routine relations between our senses and technologies in ways that can be conspicuously apprehended, enabling new kinds of experiential and embodied enrichment.
Conclusion
Recipe videos are, as mentioned earlier, among the most viral on Facebook. Our analysis foregrounds three dimensions – visual and auditory attributes, interactivity, and embodied experience – as crucial to these videos’ effectivity within the stimulus-crowded attention-frameworks of the Facebook interface and the digital screen. Applying this three-dimensional model to other genres would enable fine-grained comparisons between semiotic attributes, involvement structures, and embodied and sensorimotor relations across different genres and platforms. However, our analysis also suggests that while these dimensions are all crucial, they are structured towards a particular outcome: the foregrounding of the user’s body and the activation of sensorimotor experience.
We postulate that the attributes we have explored are also relevant to other highly popular and viral video genres across social media platforms. These include hand-focused videos, such as slime preparation and DIY videos, short atmospheric videos of natural landscapes, and ASMR videos. 6 We have called these ‘hyper-sensory videos’, using the prefix ‘hyper’ to mark the overt and intense quality of sensory arousal characterizing these genres. Interestingly, such genres recall a potential historical precedent examined by Gunning (2006) in early cinema: ‘the cinema of attractions directly solicits spectator attention, inciting visual curiosity, and supplying pleasure through an exciting spectacle […] Theatrical display dominates over narrative absorption, emphasizing the direct stimulation of shock or surprise at the expense of unfolding a story or creating a diegetic universe’ (2006: 384). In contrast to the cinema of attractions, however, hyper-sensory videos utilize a new arsenal of audio-visual and interactive techniques to arouse multiple senses: not just visual spectacle, but a synaesthetic experience of embodied interaction enabled by the smartphone.
The advent of hyper-sensory videos as readily accessible cultural experiences, attuned to but also potentially disrupting the habitual embodied frameworks of social media use, suggests a parallel in the realm of popular culture to the increasing emphasis on affect in the humanities and social sciences. Hyper-sensory videos seem primarily designed to do affective work, in that the sensory arousal they produce is largely pre-discursive, unreflexive (though available for subsequent verbalization), not-yet-categorized as a particular emotion, and geared towards immediate bodily responses (Massumi, 2002) – particularly the movement of users’ hands on their digital devices.
This affective work should not, however, be placed beyond cultural parameters. To give one example, when a Tasty recipe video for a non-kosher dish (i.e. that violated Jewish religious dietary laws) was shared in an Israeli Facebook group, participants responded with ‘ichs!’ (Hebrew for ‘yuk!’) and ‘disgusting’. These were verbal articulations of primary sensory responses rather than disagreements over whether non-kosher recipes were permissible on the page (or anywhere else). The response to the violation of dietary laws was articulated as an immediate bodily reaction. Culture and affect, at least in this example, seem to be co-organized. 7
This observation contributes to our understanding of social media and digital interfaces as operating within technologized affective economies (Ahmed, 2004). Hyper-sensory videos such as those we have analyzed organize the sensory relations between physical bodies, depicted bodies, and objects in ways that are conducive to the operations of the media technologies which bring these components together – the ongoing flow (Lupinacci, 2021) of the social network feed. No less significantly, they emphasize the existence and salience of highly sensorial relationships with technologies that are frequently described as ‘virtual’ and ‘intangible’, with media such as video, which are conceptualized as mainly visual, and with genres that are often considered trivial.
Contemporary innovations such as haptic technologies and industrial and public discourse concerning ‘the metaverse’ suggest that ours is an era of renewed interest in tangibility and presence (Dourish, 2017; Gumbrecht, 2014; Jütte, 2008). Public discussions of sensory arousal and technologically-generated embodiment have themselves become a part of contemporary digital culture. Yet, in contrast to computer games, virtual reality, and similar technologies, hyper-sensory videos are encountered in the frenetic and highly distractive contexts of social media, which do not allow for fully immersive engagement. Instead, they punctuate everyday social media use with moments of intensified, multi-sensory stimulation, which also provides potentially reflexive experiences of such arousal: users describe Tasty videos as ‘one-minute meditations’, DIY videos as ‘satisfying’, and ASMR is referred to as ‘braingasm (brain orgasm)’ (Andersen, 2015; Gallagher, 2019; Nansen and Balanzategui, 2022). Hence, recipe videos on social media are more than a frivolous trend: they are part of a conspicuous cultural response to broader questions regarding materiality, presence, and embodied relations within a highly mediated social reality.
Footnotes
Acknowledgments
The authors would like to thank Ifat Maoz, Limor Shifman, and Lillian Boxman-Shabtai for their feedback on the earlier stages of this work, and the journal’s editors and reviewers for their helpful comments. The authors would also like to thank the Mandel Scholion Interdisciplinary Research Centre scholarship program.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
