Abstract
This article examines the relationship between prediction and serendipity in the short-video social media platform TikTok, analyzing its recommendation algorithm through the lenses of affective and pragmatic turns in cognitive science. By looking at TikTok’s user experience, I demonstrate that while predictive models are crucial for user engagement, elements of surprise and unpredictability are equally essential for maintaining user interest. The study draws on theories of perception, emotion processing, and affect to provide a comprehensive understanding of the cognitive and embodied dimensions of digital social media experiences. I argue that TikTok’s success lies in its unique integration of both predictive accuracy and serendipitous discovery, creating an “indeterminacy center” that keeps users engaged. This research contributes to the broader understanding of social media dynamics, offering insights into the balance between prediction and serendipity in digital platforms.
Keywords
Introduction
The discourse around artificial intelligence (AI) frequently emphasizes its predictive capacities and is often highlighted in popular science outlets with headlines such as “Predicting the future is now possible with powerful new AI simulations that can model every conceivable social interaction” (Scientist, 2019). Such confidence underpins a wide range of applications, from healthcare systems to predictive policing systems like PredPol, used by police bureaus in several countries (Maguire, 2018: 79). One of the primary applications of predictive models is in recommendation algorithms, which determine, for example, what content will be provided to each user in social media applications such as TikTok. Recommender systems focus mostly on optimizing prediction accuracy to ensure that suggested content closely matches user preferences (Kaminskas and Bridge, 2016; McCay-Peet et al., 2014). However, several studies (Cruz and da Trindade, 2023; Herlocker et al., 2004; Makri et al., 2014; Toms, 2000) have emphasized the significance of additional factors beyond accuracy, such as diversity, novelty, coverage, and serendipity, which enhance user engagement and improve recommendation quality. Among these, serendipity—defined as the ability to deliver unexpected yet relevant suggestions—has attracted particular interest (Cruz and da Trindade, 2023; Kaminskas and Bridge, 2016; Kischinhevsky et al., 2024; Krook, 2025; Reviglio, 2019).
This study examines the role of prediction and serendipity in TikTok’s recommendation algorithm through the perspective of cognitive science’s pragmatic turn, as well as affect theory. The pragmatic turn in cognitive science has put prediction at the center of its theories arguing that the brain’s primary function is to forecast reality based on prior knowledge (Engel et al., 2015). Cognitive scientist Andy Clark, in his book “Surfing Uncertainties: Prediction, Action and the Embodied Mind (Clark, 2016) describes “a general, and increasingly well-supported, vision of the brain (and especially the neocortex) as fundamentally an inner engine of probabilistic prediction.” (Clark, 2016: 27–28). In addition, to address the serendipity and indeterminacy of social media experience, I turn to affect theory and speculative pragmatism in order to uncover how affect, emotion, and prediction interact in the digital realm.
TikTok’s recommendation algorithm is analyzed by engaging with studies that utilize a variety of research methods, including ethnographic observations, cognitive neuroscience experiments, and content analysis. This study contributes to a broader understanding of how affective and algorithmic dynamics on social media platforms like TikTok can inform the design of more diverse, engaging, and ethically responsible digital environments. In particular, it explores how the interplay between prediction and serendipity in algorithmic design can provide a framework for rethinking recommendation systems—not merely as tools for engagement, but as structures capable of fostering diverse and well-informed algorithmically curated media environments. Drawing on insights from cognitive science and affect theory, this study aims to contribute to the fields of new media and communication studies by advancing the understanding of embodied dimensions of digital media experience (perception, reward learning, and emotion processing), particularly regarding TikTok’s recommendation algorithm. In addition, this research contributes to the interdisciplinary field of affective computing (Pei et al., 2024; Picard, 1997) which seeks to design computational systems capable of perceiving, interpreting, and responding to human emotions in intelligent, contextually sensitive, and socially meaningful ways.
TikTok’s recommendation algorithm
TikTok, a popular short-video application launched in 2016 by the company ByteDance, has reached over 1 billion monthly active users globally in 2024 (Team SAsC, 2024), with an average daily time spent by users of 58 minutes and 24 seconds (Team SAsC, 2024). Videos on TikTok are mostly of short length (under 3 minutes), although there are new trends of longer videos. Upon opening the application, users are immediately directed to the “For You” page, a highly personalized, vertically scrolling feed that continuously presents algorithmically recommended short videos. The transition from one video to another is done with a swipe gesture from bottom to top of the screen. The interface features a full-screen video display and interactive elements, including like, comment, share, and follow buttons, that are positioned along the right-hand side of the screen. A bottom navigation bar provides access to additional features such as the Discover page (for trending content and hashtags), the Create button (allowing users to record and edit videos), and the Profile tab (which displays a user’s content and interactions). The platform also integrates real-time engagement features, such as livestreams, duets, and stitches. In this section of the article, I look at literature using a variety of methods to study TikTok’s recommendation algorithm and how it affects user experience. Engaging with the literature on TikTok’s recommendation algorithm and the theoretical frameworks, I observe three central dimensions of recommendation algorithms user experience: they are relational (the algorithms shape user experience, and vice versa), processual (recommendation algorithms keep evolving in time), and embodied (cognitive and affective user experience mobilize and shape the body).
In an ethnographic study conducted in the United Kingdom in 2019 and 2020 among TikTok users, the application was often described as a “feel good space,” a tool that helped to digest the day and clear the head after a day of work (Schellewald, 2021: 1574). The role of fulfillment and distraction in media experiences, such as in soap operas or television shows, is nothing new, as researchers of television have long described (Schellewald, 2021). As in television experience, there is also an important social aspect present on TikTok. Even though the application is mostly used individually on mobile phones, sharing videos with friends is an important part of the media experience. Despite various parallels that can be established between television and digital social media experience, TikTok has personalization features inscribed in its recommendation algorithms that significantly affect this particular media experience. As Zhao described, “a much-discussed topic related to TikTok’s technology is its AI-based recommendation algorithm, considered by many the secret to the app’s success” (Zhao and Wagner, 2023: 826).
Recommendation algorithms are ubiquitous in social media experience nowadays, trying to predict content that meets users’ interests. TikTok’s recommendation algorithm, particularly the algorithm that curates the videos shown on the “For You” landing application page, is acknowledged as one of the main aspects boosting the number of views of videos on the application (Zhang and Liu, 2021). Zhao and Wagner (2023) note that TikTok’s user experience is driven by its recommendation algorithm, not user searches. The application actively pushes content to users, significantly reducing their control over what they see and shaping their overall experience on the platform. A Wall Street Journal investigative study (Matt and Stern, 2021) aiming to analyze the mechanisms underlying TikTok’s recommendation algorithm observes that TikTok operates almost entirely on algorithmic curation, with approximately 90–95% of the videos seen by users being selected by the recommendation system rather than appearing due to direct engagement, such as follows or subscriptions (for comparison, the study estimated that around 70% of videos watched on YouTube come from algorithmic recommendations, while the remaining 30% result from direct user engagement). The study involved the creation of multiple automated bot accounts, each simulating a new user with no prior engagement history. These accounts were programmed to consume content, avoiding explicit interactions such as liking, sharing, or commenting. Instead, the primary behavioral variable under observation was watch time, which refers to the duration an account spent viewing a particular video before scrolling to the next.
The findings revealed that TikTok rapidly personalizes content based on time spent watching videos related to a specific topic—such as mental health, conspiracy theories, or political content—the algorithm promptly adjusted by curating a feed that increasingly emphasized similar material. This reinforcement loop demonstrated how TikTok’s algorithm is not merely reflective of user interest but actively shapes and amplifies content exposure. A key insight from the study was the speed and efficiency with which TikTok categorizes users. Unlike other social media platforms, such as YouTube or Facebook, which rely on explicit user actions (e.g. follows, likes, and shares), TikTok’s system is highly responsive to subtle behavioral cues. The investigation found that the platform could establish a user’s primary interests within approximately 2 hours, crafting a personalized feed based solely on viewing patterns. According to the study, watch time is by far the most influential factor in TikTok’s recommendation algorithm, if a user lingers on a video, even briefly, the algorithm takes note and serves more content of the same nature. However, other factors also play a role, TikTok tracks video interactions—likes, shares, and comments—and analyzes content metadata, including hashtags, captions, and even the sounds used in videos, to better categorize and distribute content to the right audience. In addition, user account details, such as language settings, device type, and location, can influence recommendations, though they carry less weight compared to behavioral data.
TikTok’s recommendation algorithms are in a constant state of refinement, incorporating advanced data analysis and AI learning methods to handle an enormous volume of daily video views—over a billion (Zhou, 2024). Recent reporting indicates that ByteDance has consolidated its position as a global leader in AI, channeling substantial investments into large-scale systems that extend well beyond the sphere of social media and into fields such as education, productivity applications, and healthcare (Tobin, 2025). This expansion underscores the growing influence of AI across diverse sectors and highlights the extent to which algorithmic systems are shaping cultural, economic, and social practices. At the same time, it raises pressing questions about how, in an era defined by rapid technological change, societies might navigate the long-term tension between algorithmic prediction and indeterminacy. For recommendation algorithms in particular, addressing this tension between the predictive modeling and the openness of more uncertain, serendipitous encounters constitutes an important avenue for future research, especially given TikTok’s scale and its ability to shape media consumption habits globally.
From the perspective of user practices, TikTok’s algorithms evolve in constant interaction with behavior on the platform. Schellewald (2021) notices that over time, the user’s behavior on the application can influence the recommendation algorithm. The personalization of the recommendation algorithm needs data from user’s behavior and preferences, “the ‘For You’ quality of TikTok that renders it appealing as ‘feel good space’ is nothing inherent to the app as such but rather actively constructed in situated practices of reading TikTok for personalization” (Schellewald, 2021: 1577). In their website, TikTok encourages the user to curate personalized feeds (TikTok, 2020), and gives some information on how it works: “Your first set of likes, comments, and replays will initiate an early round of recommendations as the system begins to learn more about your content tastes” (TikTok, 2020). It is important to notice that the recommendation algorithm has relational and processual dimensions that must be taken into account, as Hagar and Diakopoulos demonstrated, “the interplay between user feedback and algorithmic recommendation shapes the user’s platform experience” (Hagar and Diakopoulos, 2023: 2).
The influence of the user’s behavior in shaping TikTok’s recommendation algorithm can be addressed through the notion of algorithmic imaginaries. Algorithmic imaginaries, as discussed by Gandini et al. (2023), refer to the collective perceptions, feelings, and understandings that users develop regarding the functioning of algorithms in their everyday digital experiences. Rather than focusing on the technical operations of recommendation algorithms, this concept highlights the ways in which users construct their own interpretations of algorithmic influence based on their interactions with digital platforms. As social media algorithms remain largely opaque, users form speculative models to explain why they see certain content, how their behaviors shape future recommendations, and to what extent they retain control over their online experiences. TikTok’s recommendation algorithms exemplifies this dynamic, many users perceive the algorithm as an entity that understands them deeply, sometimes even better than they understand themselves, reinforcing the idea that it actively shapes their content consumption rather than simply responding to their preferences.
Serendipity in recommendation algorithms
Research in the field of recommender systems has traditionally prioritized accuracy, ensuring that predicted ratings closely align with users’ preferences (Kaminskas and Bridge, 2016; McCay-Peet et al., 2014). However, some studies (Cruz and da Trindade, 2023; Herlocker et al., 2004; Makri et al., 2014; Toms, 2000) have also highlighted the importance of beyond-accuracy objectives, such as diversity, novelty, coverage, and serendipity, which contribute to a more engaging and effective recommendation experience. Among these, serendipity—the ability to surprise users with unexpected yet relevant recommendations—has gained particular attention (Cruz and da Trindade, 2023; Kaminskas and Bridge, 2016; Kischinhevsky et al., 2024; Krook, 2025; McCay-Peet et al., 2014; Reviglio, 2019). Unlike novelty, which emphasizes new content, serendipity focuses on recommendations that are both unforeseen and positively surprising, potentially broadening users’ interests. Optimizing for serendipity presents challenges, as it requires balancing familiarity and discovery, ensuring that recommendations are not just unexpected but also meaningful (Kaminskas and Bridge, 2016). To enhance serendipity, various techniques have been explored, some methods focus on identifying items that are distant from the user’s profile, ensuring a break from typical patterns. Others use graph-based approaches to find “bridging” items—those that connect separate areas of interest in a user-item network. However, measuring and optimizing serendipity is inherently challenging, as unexpectedness is difficult to define quantitatively (Makri et al., 2014).
ByteDance addressed some of these challenges in one of their recommendation system, Monolith (Liu et al., 2022). Monolith continuously updates its model parameters using real-time feedback, allowing the system to surface content that feels fresh, unexpected, yet relevant. Unlike traditional batch-trained models, which may reinforce familiar patterns, Monolith updates recommendations at minute-level intervals, ensuring that content reflects the latest user interactions. This real-time adaptation makes it possible to introduce surprising yet engaging elements, such as emerging trends or niche content, that users might not have actively searched for but still find compelling. The Monolith experience offers key insights into why TikTok’s algorithm excels at delivering serendipitous recommendations. Many authors (Zhang and Liu, 2021; Zhao and Wagner, 2023) have observed that creating surprising, unexpected, and niche recommendations is an essential aspect of TikTok’s algorithm. Differently from most applications’ emphasis on improving their recommendation algorithm’s accuracy, “which can easily produce the ‘information cocoon’ trap” (Zhao and Wagner, 2023), TikTok has improved serendipity-related factors rather than only improving the precision of the algorithm.
Zhang and Liu (2021) observe that TikTok, through its recommendation algorithm, is not only able to accurately recommend videos that users are interested in but also allows them to expand into new topics that they may be interested in, thereby appealing to user demand for novelty and serendipity. (p. 846)
The unpredictability of content on TikTok is thought to evoke surprise and delight in the user, encouraging immersion and engagement. Zhao and Wagner (2023) found that perceived recommendation serendipity positively influences the user experience on short-video platforms like TikTok, potentially explaining the application’s engaging nature. They observe that “perceived recommendation serendipity”—the extent to which users feel recommended content is surprising and exceeds expectations—was introduced to counter “filter bubble” or “overfitting” in TikTok’s recommendation system, where users are shown increasingly narrow content based on their previous behavior (Zhao and Wagner, 2023). This unpredictability in content delivery significantly contributes to users’ engagement as the excitement and feeling of “what is next?” that each video creates in the expectation of the next one is a powerful engagement force, capturing and maintaining user attention, and sometimes driving users to “doomscrolling” (Cruz and da Trindade, 2023). Defined as automatic, compulsive, and immersive navigation in social networks, doomscrolling represents a state where users find themselves unable to stop scrolling through content for extended periods. The relationship between doomscrolling and serendipity is particularly salient, as research suggests that serendipity can contribute to doomscrolling behavior (Cruz and da Trindade, 2023). When users stumble upon engaging content unexpectedly, it creates a reward mechanism that motivates continued platform use.
While this article primarily examines recommendation algorithms, it is crucial to acknowledge TikTok’s parallel exploration of serendipity within its user interface and experiential design paradigms. Cruz and da Trindade (2023) have pointed that the TikTok’s multiple feed types, including the “For You” page, Following feed, Live feed, and Music feed, offer different pathways for unexpected content discovery. Moreover, its search mechanism incorporates auto-complete suggestions, voice search capabilities, and trending topic recommendations, all of which increase opportunities for serendipitous encounters. The mixture of content from both followed and non-followed accounts further enhances the potential for unexpected discoveries. Zhao and Wagner (2023) also discuss Krug’s principle for effective technology design, which emphasizes simplicity and ease of use. The idea is that users should be able to navigate and use a product without conscious effort, reducing mental strain and improving overall user experience. Looking through affective and pragmatic perspectives to understand media experience, one notices the important role played by non-conscious embodied cognitive processes.
Prediction and the pragmatic turn in cognitive science
To address the embodied and cognitive dimensions of TikTok’s recommendation algorithm user experience (perception, reward learning, and emotion processing), I turn to cognitive science, and particularly theories of the pragmatic turn (Barrett, 2017; Bergson, 2008; Engel et al., 2015), which has put at the forefront of cognitive science the idea of the brain as an action-oriented prediction generator. I turn to two contemporary frameworks to better grasp what can change in our understanding of some of the important aspects of TikTok experience: the theory of sensorimotor contingencies (O’Regan and Noë, 2001) and the theory of constructed emotions (Barrett, 2017). As Fingerhut (2021) argues, “digital media are not disembodied media,” and the “embodied habits and skills employed when engaging cultural artifacts constitute a central level of description” (Fingerhut, 2021: 2). Building on their insight into “how experiential domains are generated through embodied media habits” (Fingerhut, 2021: 2), I contend that the pragmatic turn in cognitive science offers a productive framework for analyzing how medium-specific habits shape cognitive engagement at the levels of perception and affect. From this perspective, TikTok’s recommendation algorithm can be understood not only as a technical system but also as a medium that actively shapes user cognitive engagement and affective experience.
Theory of sensorimotor contingencies
In “A Sensorimotor account of vision and visual consciousness,” O’Regan and Noë (2001) describe vision not as an internal representation of the outside world, but as something closer to dancing, or “a mode of exploration of the world that is mediated by knowledge of sensorimotor contingencies” (p. 940). The authors argue that through lived practice, over the course of life, the body acquires “knowledge” about how objects and other bodies will appear and behave according to their position, material, lighting, and every other perceived aspect of it. According to O’Regan and Noë (2001), each visual stimulus and attribute encountered will have “particular sets of sensorimotor contingencies associated with it” (p. 945). Once a set of sensorimotor contingencies is learned, it becomes latent and is actualized every time this particular set is experienced again. This is what O’Regan and Noë call the “knowledge of sensorimotor contingencies,” which is at the basis of visual perception. Like dancing, seeing requires practice, knowledge, and mastery. Just like knowing how to walk or to ride a bike, the knowledge of sensorimotor contingencies is practical and embodied, and it should not be understood as a verbal or propositional ability.
Moreover, as in partner dancing, one must consider the movements and actions of the other and the environment of the interaction. They argue that seeing is also about the interaction between the person who is seen, the object that is being seen, and the environment where the interaction is taking place. It is thus impossible to establish a unique and direct link between brain states and visual experiences, as they are the result of complex relations mediated by knowledge of sensorimotor contingencies. As they explain, “exactly the same neural state can underlie different experiences, just as the same body position can be part of different dances” (O’Regan and Noë, 2001: 966). Acquiring knowledge about sensorimotor contingencies means being able to predict the effects of one’s actions and to adapt to new perceptions.
The constructed aspect of visual perception is also discussed by psychologist and neuroscientist Lisa Barrett in the development of the theory of constructed emotions (Barrett, 2017). Echoing the pragmatic turn in cognitive science, they highlight that perception is not simply a reaction to external stimuli or an internal representation of the world, but it is a process that starts with predictions originated in the brain, more specifically in the interoceptive network, that are then confronted with the signals perceived. In visual perception, the predictions generated in the brain travel to the primary visual cortex where they meet the map of the visual field coming from the retina, and when there is a difference between expectations and actual input, adjustments in behavior and predictions are made. For Barrett (2017), “this efficient, predictive process is your brain’s default way of navigating the world and making sense of it” (p. 60). They argue that the brain anatomy itself shows evidence of the fact that most of the activity in the visual system originated actually in the brain, and not as a neural response to light signals coming in through the retina: “90 percent of all connections coming into V1 [primary visual cortex] carry predictions from neurons in other parts of cortex. Only a small fraction carries visual input from the world” (Barrett, 2017: 61). Thus, visual perception is the encounter of predictions originated in the interoceptive networks of the brain with the signals coming from the optic nerves activated by external stimuli. Interoceptive network predictions play a central role in visual perception (or any type of perception for that matter, but the scope of this article will be limited to visual perception), as Lisa Barrett (2017) highlights, You might think that in everyday life, the things you see and hear influence what you feel, but it’s mostly the other way around: that what you feel alters your sight and hearing. Interoception in the moment is more influential to perception, and how you act, than the outside world is. (p. 79)
In the book Cloud Ethics, Louise Amoore (2020) situates algorithms within a genealogy of perception technologies, arguing that they, like past innovations such as the printing press and cinema, transform how people perceive and interact with the world. Rather than viewing algorithms as mathematical knowledge forms with a “status of objective certainty and definiteness in an uncertain world” (Amoore, 2020: 9), the author argues that they are embedded with assumptions, biases, and probability weightings, which not only influence their outputs but also shape their ethical and political existence. Drawing on Henri Bergson and aligned with the pragmatic turn in cognitive science, the author emphasizes that perception—whether through the human eye, a camera, or an algorithm—is always selective and attuned to action, filtering reality into what is deemed relevant. Rather than replacing human perception, recommendation algorithms extend and reshape the ways things become perceptible, extracting and reducing features to construct new modes of seeing and knowing. Amoore draw on the concept of fabulation to acknowledge the inherent uncertainty and indeterminacy in the process of algorithmic creation, the algorithm’s actions and decisions are not fixed but are constantly being rewritten and adjusted based on new data and interactions.
In the case of TikTok, one can argue that recommendation algorithms are mimicking the knowledge of sensorimotor contingencies trying to learn users’ preferences to predict their interests, shaping what is perceived by users and constantly adjusting to each user’s behavior. Moreover, the repetition of use and gestures on the application, each video swipe, each like button, each millisecond spent or not on a video contribute to the algorithmic knowledge of sensorimotor predictions, trying to engage the user and keeping their attention on the application. Research has demonstrated that the personalization of the recommendation algorithm can change brain activity, particularly in areas associated with value of reward. A study by Su et al. (2021) looking at brain activity related to TikTok’s recommendation algorithm compared the activation of personalized video recommendations versus a general feed (without any personalization) of participants in an fMRI. The results showed that while both recommendations (from personalized and general feed) have activated brain areas related to reward learning, such as the ventral tegmental area (VTA), substantia nigra (SN), and nucleus accumbens (NAc), only the VTA is selective to the personalization and was significantly more activated while participants looked at personalized recommendations (Su et al., 2021). Research has shown that both VTA and SN play crucial roles in reinforcement learning and motivated behavior. However, these areas seem to have different functions in transmitting motivational signals: SN neurons primarily process saliency, while VTA neurons process reward value. Since both personalized and generalized videos contained dynamic visual and auditory elements, the strong activation of SN likely reflects a neural response to the overall saliency of video content. On the other hand, the selective activation of VTA in response to personalized videos might suggest a difference in the value of reward of personalized videos compared to generalized ones (Su et al., 2021).
Emotion processing
Lisa Barrett (2017) argues that the mode of functioning of visual perception described above can also describe the way emotions are generated in the brain. As they explain, every perception, every emotion is the result of categorizing using concepts that one has learned throughout experience, “categorization constructs every perception, thought, memory, and other mental event that you experience (. . .)” (Barrett, 2017: 86). In categorizing, concepts are mobilized to explain and predict whatever one is experiencing. In categorization, two processes work together in opposite flow directions: a bottom-up hierarchy where primary sensory regions fire neurons to signal bodily sensations to “higher” brain regions, and a top-down hierarchy sending predictions from the interoceptive network to the primary sensory regions.
These predictions are goal-based concepts, or dynamic summaries of the many instances belonging to a goal-defined category constructed by the brain. Barrett gives the example of the goal-based concept “things that fly,” which could include animals like birds and insects, but also airplanes, helicopters, or even a baseball or a dart. Goal-based concepts are “flexible and adaptable to the situation” and they allow a person to establish similarities between objects that may differ in every other aspect. It is important to highlight that categorization is not an effortful and conscious process preceding experience, but rather a “rapid, automatic” process “performed constantly by your brain, in every waking moment, in milliseconds, to predict and explain sensory input that you encounter” (Barrett, 2017: 86). They describe emotions not as “reactions to the world,” but as “constructions of the world” (Barrett, 2017: 104) oriented by these goal-based concepts. A parallel can be drawn between the knowledge of sensorimotor contingencies described by O’Regan and Noë in visual perception with the process of concept categorization in the theory of constructed emotions developed by Barrett in the sense that both theories have prediction and goal-based action as a main component of cognitive and emotional processes.
Complementing this idea of cognition as prediction based on goal-based concepts, pragmatic precursor Henri Bergson brings forth the indeterminacy that lives in the interval between action and reaction in any living being. For Bergson (2008), the nervous system acts to create an interval between the perceived action and the reaction, and in this interval lies the possibility of creating a new form: “Each advance of the nerve centers by giving the organism the choice between a greater number of actions, would appeal to the virtualities capable of surrounding reality, thus loosening the stranglehold, and let consciousness pass more freely” (p. 80). For Bergson (2008), a living body is a reservoir of indeterminacy, as it unpredictably expands the field of possible reactions. Indeterminacy is thus at the origin of creation, of the emergence of the new, of “the continuous creation of unpredictable novelty that seems to continue in the universe” (p. 1). In this perspective, sensorimotor contingencies, prediction, and goal-based concepts can tell only part of the story. Indeterminacy, unpredictability, and serendipity are also important characters in any media experience, or any experience tout court.
Sentiment analysis, an aspect of affective computing dedicated to label and forecast emotional effects of natural language content, has also put prediction at the center of its models and algorithms (Pei et al., 2024). As Susanna Paasonen (2023) described, “[s]tarting with likes, this economy has expanded to broader attempts to measure and modulate intensities and qualities of feelings” (p. 85). They discuss the limitations and complexities of using sentiment analysis, particularly in predicting and quantifying emotional responses. While companies attempt to categorize affect and emotion into neat, taxonomical boxes for monetization, the inherent ambiguity, and variability of affect resist such simplification. As Bergson demonstrated, cognition also acts as a center of indeterminacy, thus it cannot be fully predicted or captured by data. Paasonen argues that cultural and social inquiry must account for this ambiguity, as affective experiences and emotions are complex, unpredictable, and often escape rigid classification. The author critiques the power of data capitalism to monitor, predict, and manipulate user behavior, noting that while it has significant societal, economic, and political impacts, it is not uniform or fully effective. The work of cultural and social analysis involves embracing this ambiguity, recognizing that cultural products and social phenomena can have multiple, often contradictory, interpretations and effects. Social media, in particular, is highlighted as a space where the same objects can evoke diverse and fluctuating emotional responses.
Furthermore, Amoore’s (2020) conceptualization of aberration illuminates the inherent indeterminacy endemic to algorithmic systems. Algorithmic aberration refers to instances where algorithms, designed to function logically and predictably, produce irrational, unexpected, or harmful outcomes. The author mobilizes examples such as YouTube’s recommendation system suggesting violent and inappropriate videos for children to demonstrate that algorithms can behave unpredictably. Amoore argues that these aberrations are not mere malfunctions but intrinsic to algorithmic logic itself, suggesting that algorithms do not just reflect rational computation but also incorporate elements of unpredictability and unreason, making indeterminacy a fundamental aspect of their functioning.
Affect theory is the instrument that allows to study ambiguous and indeterminate dimensions of social media experience, particularly TikTok. TikTok’s algorithm does not simply respond to user preferences in a fixed manner but predictively constructs an emotional engagement landscape by inferring affective states from micro-temporal behaviors such as watch time, pauses, and replays. The “power of interruption” (Massumi, 1995), fast cuts, and rapid changes of TikTok’s scrolling contribute in creating a suspension effect, a critical point in experience as the user goes through the unpredictable and expectation for the next video. TikTok has created its own indeterminacy center, combining prediction and serendipity, for an experience that is personalized but not overly fitting and provides surprise and unpredictability. Affect theory holds the key for looking at cognitive and embodied dimensions of media experiences as indeterminate and unpredictable processes.
The affective turn: the missing half-second
In his seminal 1995 text, “The Autonomy of Affect,” Brian Massumi starts by engaging with Hertha Sturm’s (1987) research on the emotional effects of television on the audience, or as they formulate: “how media-induced emotions originate, how long they last and whether they change or not” (p. 25). Massumi focuses on a 1980 study done by Sturm and their team involving three different versions of a German 28-minute TV short film for children telling the story of a snowman. The three versions were slightly different in terms of their language content: no words in the original non-verbal version, a commented factual version, and a third version of the film with a narration with emotional attributions. In Sturm’s study, different psychological and physiological measures were collected from 9-year-old participants to capture the effects of the different versions of the short films, including arousal and valence. It was observed that the factual version created the highest arousal levels in the audience but was rated as the least pleasant and was also the least remembered by the participants. On the other hand, the non-verbal version was rated as the most pleasant and had higher levels of skin activity (autonomic body response). A particular result of this study was a surprise (in their own words) for Sturm and their team: the children rated the sad scenes as the most pleasant, as Massumi (1995) put it, “the sadder the better” (p. 84). It is interesting to note that, in a recent study that used eye tracking to measure the behavior of TikTok users, Guo et al. (2023) “showed that emotions grab attention and information processing differently.” Their results showed that “negative emotions attracted more attention, which was consistent with previous studies on negativity bias, that is, negative information was more important and attracted more attention” (Guo et al., 2023). Interestingly, as in Hertha Sturm’s snowman study commented on by Massumi, in TikTok too, the “sadder the better.” Participants in the study have also spent more time reading emotional text versus non-emotional, and on negative versus positive or neutral stimuli (Guo et al., 2023).
This apparent paradoxical result brought Massumi to formulate the distinction between affect and emotion. Noting the divergence between the two types of responses (the duration and the valence of the media effect) and the “gap between content and effect” (Massumi, 1995: 84), Massumi identifies two levels to image reception: intensity and qualification. Content, in this framework, is the process of qualification, “indexing to conventional meanings in an intersubjective context, its socio-linguistic qualification.” Content is associated with “expectation, which depends on consciously positioning oneself in a line of narrative continuity” (Massumi, 1995: 85). In parallel to the qualification process, media experience also incorporates intensity, or affect, acting “outside expectation and adaptation, as disconnected from meaningful sequencing, from narration, as it is from vital function,” it is “a suspension of action-reaction circuits and linear temporality” (Massumi, 1995). In the dimension of intensity, paradoxes exist, sad and pleasant can belong together. But when language gets involved in the process, it is the dimension of qualification that is operating. Emotions belong to qualification, they are “subjective content” or “qualified intensity,” but Massumi highlights that both intensity and qualification are embodied processes. Building on this, Massumi explores Benjamin Libet’s famous experiment on the “missing half-second”—the brief interval between brain activity and the actual execution of an action. Massumi (1995) theorizes that this elusive half-second is the home of affect, or intensity, revealing a “turning point at which a physical system paradoxically embodies multiple and normally mutually exclusive potentials, only one of which is ‘selected’” (p. 93)
In a fascinating parallel, Hertha Sturm’s own work about emotional effects of media is followed by a chapter also titled “The Missing Half-Second,” though it offers a markedly different interpretation of the concept. I will come back to Massumi’s treatment of the missing half-second, but I first look at Sturm’s understanding. In the study, Sturm focuses on visual perception, questioning whether there is a difference between media and non-media perception. They note that perception is “in part determined by past experience and expectation” (Sturm et al., 1987: 37) and observe there is a “short period of time the expectation of an event and its recurrence” in “real-life situations” (“non-media interactions”). Then, Sturm et al. (1987) compares non-media (or what they call “real-life,” which is curious because it supposes that media is not real-life) and television experience to conclude that in television perception, “we are unable to predict what’s going to happen next” (p. 38), but in “real-life,” because one has more time to react, it is easier to understand and process content. They argue that due to the “rapidly changing presentations,” “unpredictable flip-flops” or “speeded-up action” (Sturm et al., 1987: 39) of television, perception in “real-life” has an extra half-second when compared to televised-mediated perception. This missing half-second of television perception causes lack of time for internal verbalization and “perceptual overload.”
Central to Sturm’s analysis of the missing half-second is the idea of verbal internalization. They argue that verbal internalization (or internalized labeling) is essential to generate expectations and experiences, “represents an ordering of impinging, external stimuli and their assignment to cognitive and emotional reference systems” (Sturm et al., 1987: 38). Categorization is another term used by Sturm to refer to verbal internalization, and they argue it plays an important role in enhancing comprehension and retention of media perception. Categorization is also named as a key cognitive and emotional process by Barrett, as discussed earlier. Sturm et al. (1987) advocates for television content creators to include in their productions “half-seconds for internal labelling” to allow “the viewer to make judgments” and “help one to name and understand one’s own and other emotions” (p. 40). Sturm et al. (1987) was conscious that slowing down the pace of televised shows was not a realistic recommendation, “instead, the plea is for structurally meaningful pauses” (p. 41). It is interesting to note that they pointed out potential implications for democratic processes if half-second transitions and pauses we not included in television content. They argue that the perceptual overflow due to unexpected adaptations to rapidly changing media harms verbal internalization and hinders comprehension and retention of information by the audience. If this process of hindered comprehension and retention is repeated on large-scale television audiences, they fear that large parts of the population would not have access to adequate information, thus harming their participation in social and democratic processes.
While Sturm’s analysis of the missing half-second focuses on the implications of this interval in the context of television and media perception, Massumi builds his arguments on experiments done by neuroscientist Benjamin Libet measuring brain activity during psychological tasks. In the second section of “The Autonomy of Affect,” after establishing the important distinction between affect and emotion (commenting on Sturm’s snowman short-film study), Massumi (1995) examines Libet’s studies demonstrating that for a lapse of half-second, “what we think of as ‘higher’ functions, such as volition, are apparently being performed by autonomic, bodily reactions occurring in the brain but outside consciousness” (p. 90). For Massumi (1995), the elusive half-second can be missing from consciousness, but it does not mean it is not acting in the body, as he explains: “the trace of past actions including a trace of their contexts were conserved in the brain and in the flesh, but out of mind and out of body understood as qualifiable interiorities” (p. 91). In this apparently missing half-second, incipiencies are at play in our body (but out of consciousness), “they are tendencies—in other words, pastnesses opening onto a future, but with no present to speak of. For the present is lost with the missing half-second, passing too quickly to be perceived, too quickly, actually, to have happened” (Massumi, 1995: 91). The missing half-second, according to Massumi, is where the potential, or the virtual, lives to be actualized, expressed, and experienced.
Contrasting Sturm’s and Massumi’s missing half-seconds, one notices a striking difference: Sturm’s missing half-second is about the qualification, or the process of indexing meaning by the user. Their plea for television producers is for a slower pace in television shows so that viewers have enough time to verbalize or categorize content. On the other hand, Massumi/Libet’s missing half-second is about intensity, or incipience. Commenting on television and Internet experience, Massumi describes the interruptions, fast cuts of video clips, the zapping, the Internet surfing, to conclude that media experience is more and more made of cuts and suspensions, beginnings without clear endings, incipience, which is the mark of affect. These incipiencies, more than content itself (or qualification), are key to understanding the affective dimension of media experience, including short-video applications like TikTok. As recommendation algorithms understand the need for serendipity and sensorimotor contingencies with an indetermination center, TikTok user experience can only be understood when considered along with the unpredictable suspension of action-reaction of the affective interval.
The missing half-second is both in qualification and intensity and it plays a crucial role in shaping TikTok’s user engagement and platform dynamics. The platform’s rapid, seamless video transitions, the unpredictable nature of the next suggested video, and the algorithm’s ability to capitalize on micro-temporal user behaviors all exploit the affective interval. The algorithm does not wait for explicit user actions such as likes or comments; instead, it picks up on millisecond-level pauses before a user swipe away from a video. If a user hesitates—even momentarily—the system interprets this as latent interest, amplifying similar content in future recommendations. This process bypasses deliberate user decision-making, operating within the pre-conscious realm of affective response before a user can fully rationalize their viewing habits. In addition, TikTok’s fast-cut editing styles, jump cuts, and sudden shifts in tone and emotion contribute to the affective intensity that fills this missing half-second. The unpredictability of what comes next further sustains engagement, reinforcing a compulsive “just one more video” behavior, reflecting the incipiencies as described by Massumi—the open-ended, pre-conscious state where multiple affective potentials exist before being solidified into conscious experience. The integration of serendipity into recommendation mechanisms contributes to unexpected content within a user’s personalized feed, introducing moments of surprise, maintaining a sense of novelty and excitement, and keeping users engaged in a continuous loop of discovery. TikTok’s recommendation algorithm operationalizes the missing half-second by finely tuning micro-temporal user behaviors into predictive engagement mechanisms, by capitalizing on pre-conscious affective responses, rapid content transitions, and an unpredictable yet highly personalized feed, the platform maximizes immersion while minimizing moments of disengagement.
Conclusion: balancing prediction and indeterminacy in TikTok
Prediction and indeterminacy should be considered side by side in the understanding of social media experiences, such as TikTok. Predictive models tell part of the story; probability is, by definition, a calculation based on the transposition or rearrangement of what has already taken place or what is in progress. Didier Debaise and Isabelle Stengers (2016) have warned that the probable belongs to a logic of conformity: what has counted in the past, what makes it possible to characterize it, will retain this power in the future. The possible, on the other hand, imports the irruption of other ways of feeling, thinking, and acting, which can only be envisaged in the mode of insistence, undermining the authority of the present to define the future (Debaise and Stengers, 2016: 87). To fully understand media and algorithmic experience, it is important to also look at the possibilities of indeterminacy rather than only the predictions. As Amoore (2020) argues, algorithms also embody propensities and possibilities “generated through multiple and contingent relations.” (p. 10). To understand TikTok’s success, one has to look into its ability to balance these seemingly contradictory elements of prediction and indeterminacy. By creating an “indeterminacy center,” TikTok has managed to combine the power of predictive algorithms with the allure of serendipitous discovery.
The implications of this research extend beyond TikTok and into the broader realm of new media, affective computing, communication, and digital experiences design. As platforms continue to evolve, the lessons learned from TikTok’s approach could inform the development of more engaging, diverse, and emotionally resonant digital spaces. Recent work suggests (Milano et al., 2020; Reviglio, 2019) that serendipity can be understood as an emergent design and ethical principle in digital information environments, as it plays a crucial role in counteracting the narrowing effects of personalization by supporting media pluralism. Reviglio (2019) highlights that incidental exposure to news and public information, a key concern for media and journalism scholars, can be reconceptualized through the lens of serendipity. Rather than treating these encounters as peripheral or accidental, Reviglio (2019) frames them as ethically and epistemologically valuable events that can challenge users’ existing beliefs, stimulate critical thinking, and encourage engagement with unfamiliar perspectives. In this way, designing for serendipity offers a framework for rethinking recommendation systems not just as tools of engagement, but as elements of civic responsibility, capable of fostering informed and reflective publics in algorithmically curated media environments. Moreover, a report by Ada Lovelace Institute (Jones, 2022) on public media highlights the importance of ensuring exposure to diverse and unexpected content as part of the broader mission to inform, educate, and entertain, noting that the absence of serendipity risks narrowing audiences into filter bubbles and undermining democratic values of diversity.
The practical application of serendipity as a design principle in short-video social media platforms involves implementing “indeterminacy centers” that balance algorithmic prediction accuracy with serendipitous discovery, incorporating beyond-accuracy metrics, and designing systems and interfaces that harness pre-conscious affective responses. Recent literature (Jones, 2022; Milano et al., 2020; Reviglio, 2019) emphasizes that to counter the risk of users being locked into narrow or compulsive cycles of interaction, one should design systems that balance personalization with transparency, enabling audiences to understand when and how algorithms are shaping their choices and to exercise meaningful control over them. This includes providing options for users to adjust algorithmic parameters, such as weighting serendipity, diversity, or novelty over accuracy, and creating moments of interruption to pause the continuous consumption of content (echoing Sturm’s recommendation for “meaningful pauses”). In the Ada Lovelace Institute’s report (Jones, 2022), it is noted that younger users increasingly manage multiple accounts, devices, and privacy settings to deliberately shape recommendation algorithms for different purposes, so they propose different approaches, such as allowing linked profiles to support different “internet personas,” enabling joint profiles that aggregate preferences of couples, groups, or communities, and developing coexisting recommendation systems that users could switch between depending on context, such as elections or breaking news. Formulated in the context of public service media, these approaches could also inform explorations in digital social media recommendation systems.
Moreover, the interplay between serendipity and doomscrolling highlights a broader challenge in digital social media design: how to balance unexpected discovery and pre-conscious responses with the imperative to promote healthy user engagement patterns. For policymakers and regulators, this necessitates developing ethical frameworks for the responsible use of micro-temporal behavioral data, establishing guidelines that balance engagement optimization with user well-being, and creating policies that support media pluralism through designed serendipity. For users, this research underscores the importance of digital literacy education that enables individuals to recognize how platforms utilize pre-conscious responses and to distinguish between healthy serendipity and addictive engagement patterns, such as doomscrolling. Collectively, these recommendations advocate for an approach to short-video social media platform design that considers not merely engagement metrics, but the broader implications of serendipity and prediction in recommendation algorithms for individual well-being and democratic discourse within an increasingly algorithmically mediated information environment. A promising avenue for future research would be to investigate other beyond-accuracy prediction metrics in recommendation algorithms and their relationship to serendipity within the context of digital social media, where dimensions such as diversity, novelty, and coverage may significantly influence user engagement and the broader dynamics of platform interaction.
This study also underscores the importance of interdisciplinary approaches in understanding contemporary digital media experiences. By integrating insights from cognitive science, affect theory, and media studies, researchers, designers, and policymakers can develop a more nuanced and comprehensive understanding of user engagement and platform design, particularly regarding its embodied dimensions (such as visual perception, reward learning, and emotion processing). The convergence of these theoretical frameworks enables an examination of how digital platforms operate not only as technological systems but also as affective environments that engage users through emotional, sensory, and cognitive channels. More specifically, this research contributes to understanding how TikTok’s recommendation algorithm operationalizes the missing half-second by classifying micro-temporal user behaviors and transforming them into predictive engagement mechanisms. By capitalizing on users’ pre-conscious affective responses, implementing rapid content transitions that prevent cognitive processing delays, and maintaining an unpredictable yet highly personalized recommendation feed, the platform creates a seamlessly immersive experience. This interdisciplinary study reveals how TikTok’s recommendation algorithm mobilizes user behavior data to mimic human knowledge of sensorimotor predictions while simultaneously integrating elements of serendipity, demonstrating that short-video social media engagement relies not only on content relevance and prediction accuracy but also on the orchestration of embodied, affective, and sensory elements that operate below the threshold of conscious awareness.
Footnotes
Acknowledgements
The author thanks the very helpful feedback from the anonymous reviewers.
Funding
The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was financed by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES).
