Abstract
IRL (“in-real-life”) live streaming turns public space into an interactive broadcast structured by live chat, monetization tools, and real-time engagement metrics. At its extreme, “deviant IRL” involves streamers staging public provocation as content. This article asks under what conditions participatory visibility converts being watched from a mechanism of discipline into an incentive for public norm violation. Examining two critical cases (Vitaly Zdorovetskiy and “Johnny Somali”) through live stream content analysis of archived video, chat logs, news coverage, and institutional records, I develop the concept of the interactive synopticon, in which the many not only watch the few, but intermittently steer them through live prompts, paid text-to-speech, and metrified feedback. I identify three interlocking mechanisms—live audience steering, a reinforcement-amplification loop, and lagged accountability—that explain how participatory visibility generates short-run escalation while producing durable records enabling institutional sanction, with implications for platform governance and the uneven distribution of sanction across race, nationality, and gender.
Keywords
Introduction
IRL (“In-real-life”) live streaming turns public space into a continuous, mobile broadcast. Streamers film themselves moving through streets, shops, and transit while responding to live chat. Unlike edited creator video, IRL is structured by real-time, quantified interactivity; concurrent viewers, chat volume, donations, subscriptions, and related cues make attention legible as it happens (Johnson, 2024; Ruberg et al., 2019; Tran, 2022; Woodcock and Johnson, 2019). Visibility here is not only exposure but a feedback environment in which platforms and remote audiences make attention measurable and actionable.
This matters because engagement is both a cultural value and a business logic on live platforms: ranking and recommendation systems reward content that generates strong reactions, feeding popularity signals into distribution and monetization (Bishop, 2019; Bucher, 2018; Gillespie, 2018). In Davies’ (2023, 2024) “reaction economy,” intensities of response (laughter, outrage, shock) are converted into data, reach, and revenue. IRL tightens this conversion by making evaluation live: creators can watch metrics update, receive donation alerts, and adjust conduct in real time (Ruberg et al., 2019; Woodcock and Johnson, 2019). Audiences are not only spectators within networked publics (Boyd, 2010); some viewers become intermittent participant-viewers through chat, donations, and especially paid text-to-speech (TTS), injecting prompts and speech into unfolding encounters in physical space.
A long tradition of social theory treats visibility as behaviorally consequential and often order-producing. Across different levels of analysis, Goffman (1959) links it to impression management and the maintenance of a workable definition of the situation; Durkheim (1895, 1984 [1893]) to moral boundary reaffirmation through public reaction and sanction; and Foucault (1977) to discipline through the internalized possibility of inspection. Despite their differences, these perspectives share a broad expectation that being seen structures conduct through accountability and evaluation, often in ways that tend to support interactional, normative, or moral order. Yet platformed visibility is not uniformly constraining. In live streaming cultures, boundary-pushing can function as an attention strategy, producing engagement spikes, wider exposure, and income (Bishop, 2019; Bucher, 2018; Davies, 2024). This article asks when, and through what observable processes, visibility shifts from discipline to stimulus, and how live streamers, audiences, and platforms convert observation into a trigger for norm-breaking.
I examine this puzzle through IRL’s extreme end, deviant IRL, where creators treat public space as a stage for norm-violating spectacle. Because platform taxonomies for IRL have shifted over time (e.g., Twitch’s reclassification of “IRL” within “Just Chatting”), I use “IRL live streaming” as a genre label rather than a fixed platform category (Johnson, 2024; Tran, 2022). Much IRL content is mundane, pro-social, or tightly moderated, and increased visibility can still cue restraint. The inversion analyzed here is most likely when (a) participation is live and monetized (chat/TTS/donations), (b) engagement metrics are continuously visible and treated as evaluative truth, and (c) a subcultural ethos frames disruption as “content” worth rewarding (Ruberg et al., 2019; Tran, 2022; Woodcock and Johnson, 2019). Although deviant IRL is an extreme case, the broader configuration of networked publics co-producing action under visibility incentives also appears in other mediated spectacles (e.g., protest live streams, reality television, and audience-funded vigilantism), where spectatorship can blur into participation and escalation.
High-profile cases make this inversion legible. Between 2023 and 2025, Vitaly Zdorovetskiy and “Johnny Somali” (Ramsey Ismael), the two cases examined here, broadcast themselves antagonizing bystanders, trespassing, and escalating disruptions in response to engagement cues, culminating in backlash, platform enforcement, and offline legal consequences (see section “Findings”). Related dynamics appear beyond niche deviant IRL: in Kai Cenat’s 2023 Union Square giveaway, live promotion and audience mobilization preceded crowd disorder and criminal charges (Associated Press, 2023), showing how IRL’s liveness and audience aggregation can spill into offline settings even when disruption is not the content strategy.
To explain how participatory visibility becomes inducement rather than restraint, this article develops the concept of the interactive synopticon, which extends Mathiesen’s (1997) synoptic configuration to a live, monetized environment in which the many do not only watch the few but can intermittently intervene through chat, paid TTS, and metrified feedback. Existing frameworks address adjacent elements—synoptic mass-mediated visibility, algorithmic threats of invisibility (Bucher, 2018), and the monetization of reaction (Davies, 2023, 2024)—but none specifies how these dynamics operate together when audience input is live and actionable.
Two terms warrant clarification at the outset because they carry specific, narrow meanings throughout. “Deviance” follows Becker’s (1963) labeling sense: contextual norm violations staged as entertainment in public space (harassment, intimidation, disruption, trespass), distinct from “misconduct” (platform rules) and “crime” (law), though these can overlap. This usage explicitly excludes non-normative identity expressions, which are often punitively designated as “deviant” and subjected to disproportionate visibility scrutiny (Gray, 2014; Ruberg et al., 2019). “Participant-viewers” denotes the subset of viewers exercising intervention capacity, which is distinct from the broader audience, who watch but do not actively shape what unfolds.
The article proceeds as follows. The theoretical framework develops the interactive synopticon through classical theories of visibility, the synopticon literature, and accounts of algorithmic power, crowd dynamics, and the reaction economy. The “Data and methods” section describes the live stream content analysis, case selection logic, and analytical approach. The findings present the three mechanisms across both cases. The conclusion addresses implications for platform governance, responsibility, and the uneven distribution of the interactive synopticon across race, nationality, and gender.
Theoretical framework
To theorize visibility, deviance, and interactional order, I draw on three classical traditions that address “being seen” at different levels of analysis. Foucault (1977) treats visibility as central to modern disciplinary power: conduct is rendered observable, comparable, and assessable through techniques of surveillance and normalization. Durkheim (1895, 1984 [1893]) frames deviance and punishment as collective processes: shared reactions to transgression clarify moral boundaries and can reaffirm social solidarity. Goffman (1959, 1967) analyzes visibility at the level of the interaction order, where actors manage impressions and “face” before specific audiences in particular settings. I use these perspectives not as a single theory of observation, but as distinct lenses on how being seen can organize conduct: through discipline (Foucault), boundary-work through reaction (Durkheim), and situated performance and face-work (Goffman). Because these traditions theorize visibility at different scales and through different mechanisms, I treat them as analytically complementary rather than as a single integrated theory.
In Discipline and Punish, Foucault (1977) reads Bentham’s Panopticon as a diagram of disciplinary power. Discipline is not reducible to punishment or to literal, continuous watching; it names a set of techniques that link visibility to assessment and correction so that individuals come to regulate themselves under the possibility of inspection. Foucault locates this effect in interlinked mechanisms: hierarchical observation, normalizing judgment, and the examination. Through these, visibility is coupled to documentation, comparison, and correction, producing what he calls “docile” and “useful” bodies.
Durkheim (1895, 1984 [1893]) starts from a different premise: deviance is normal, and the analytic object is the social response it provokes. Condemnation, ridicule, exclusion, and punishment activate collective sentiments and publicly reaffirm what counts as acceptable, thereby clarifying moral boundaries and (often) strengthening solidarity. In this account, moral order is maintained not only through internalized restraint but also through the public visibility of reaction and sanction.
Goffman (1959, 1967) shifts the unit of analysis to situated interaction. Social life is dramaturgical in the sense that actors sustain performances for particular audiences, using available signs to maintain a workable definition of the situation. Frontstage and backstage are interactional regions, and impression management does not imply conformity: actors can strategically manage impressions while bending norms or provoking conflict, depending on what counts as credible, appropriate, or face-saving in context. Visibility matters because it structures the possibilities of embarrassment and the face-work and repair through which interactional order is sustained when performances are challenged.
IRL live streaming brings these perspectives into a shared empirical terrain without collapsing their differences. The relevant “gaze” is rarely a single institutional observer; it is a dispersed audience, with a subset of participant-viewers, mediated through platform interfaces and metrified cues that streamers monitor continuously. These metrics function as a vernacular “examination.” They render conduct comparable and correctable in real time, prompting continuous adjustment toward engagement success (Foucault, 1977; Woodcock and Johnson, 2019). Discipline here does not suppress transgression; it normalizes attention-production itself, orienting streamers to optimize for visibility thresholds that may be reached through disruption. At the same time, Durkheim’s deviance-and-sanction dynamic becomes temporally segmented. A live chat may reward norm violations as entertainment in the moment, while wider publics or institutions condemn and punish them later as clips circulate beyond the stream. IRL streaming also intensifies Goffmanian region problems: monitoring metrics, responding to prompts, narrating intent, and attempting repair frequently occur on-camera, blurring the boundary between frontstage and backstage so that performers must manage a workable definition of the situation across co-present bystanders and remote audiences at once (Goffman, 1959, 1967).
Notably, this frontstage/backstage collapse is not unique to IRL streaming; live television can also stage the “backstage” as entertainment. What distinguishes IRL live streaming is not the collapse itself but what liveness does to it. As Van Es (2017) argues, liveness is not an ontological property of technology, but a construction produced through the interaction of institutions, technologies, and users, drawing on configurations of real time and sociality to generate a sense of authenticity, one that depends on concealing its own constructedness to work. In IRL streaming, this concealment is structurally intensified. The metrified markers of liveness (visible viewer counts, real-time chat velocity, donation alerts) are not incidental but constitutive of the live event itself, and audiences can actively shape what unfolds through donations, TTS, and directional prompts. The “backstage” becomes another scene made to seem authentic and, crucially, entertaining through these conditions of liveness, and repair sequences are performed under the same evaluative gaze as escalation.
Rather than presuming that visibility is uniformly disciplinary and conducive to social order, this framework treats visibility as an interactional and infrastructural arrangement whose effects depend on how observation, evaluation, and reaction are organized in real time and across the stream’s afterlives.
From synopticon to interactive synopticon
The move from panopticon to interactive synopticon draws on several lines of work that problematize, extend, or correct Foucault’s (1977) institutional diagram of visibility. The earliest and most direct challenge comes from Deleuze’s (1992) Postscript on the Societies of Control, which advances a periodization argument: disciplinary enclosures are giving way to control through continuous modulation of flows, access, and circulation. Read strictly, Deleuze claims that the panoptic model is being historically superseded rather than extended. I do not endorse the strong supersession claim, but I take from Deleuze the analytic that contemporary power increasingly operates through modulation of access and visibility rather than fixed institutional inspection. It is an analytic that helps describe how platform metrics regulate streamers continuously rather than episodically.
Latour’s (2005) notion of the oligopticon underscores that visibility is always assembled through many partial, situated sites of seeing rather than a single all-seeing gaze: visibility is plural, fragmented, and infrastructurally produced. Bucher’s (2012, 2018) account of algorithmic power operationalizes this in platform contexts. She argues that the threat of invisibility disciplines creators: the possibility of being ignored, demoted, or algorithmically buried induces continual optimization toward what platforms reward. Latour and Bucher are not seamlessly compatible. Latour’s assemblage thinking resists positing any unified seeing force, including a coherent “platform logic,” whereas Bucher (and adjacent work by Gillespie, 2018 and Davies, 2023, 2024) does treat platform optimization as a relatively coherent regime of visibility. I follow Latour’s methodological caution—there is no single algorithmic gaze; visibility is produced through heterogeneous sites and partial sightings—while accepting Bucher’s empirical claim that, despite this plurality, platform ranking systems generate sufficiently regular pressures that creators adapt to them as if they were a coherent evaluative environment. In live streaming, that threat becomes live and legible through real-time metrics, tightening the coupling between action and evaluation and intensifying the pressure to remain watchable, gripping, and surfaced (Bucher, 2018; Woodcock and Johnson, 2019). Taken together, Deleuze, Latour, and Bucher reframe panoptic visibility as plural, infrastructural, and market-oriented rather than institutional and disciplinary in Foucault’s sense.
Mathiesen’s (1997) synopticon represents a different but equally important corrective. Writing in explicit dialogue with Foucault, Mathiesen argues that Foucault theorizes only one direction of the modern gaze (the few watching the many), and omits the countervailing mass-media structure in which the many watch the few. The synopticon names this second structure: mass-mediated spectacles, celebrity cultures, and broadcast media organize attention around a small number of highly visible performers and events, producing social integration through shared spectatorship instead of through discipline.
Crucially, for Mathiesen (1997), the panopticon and synopticon do not merely coexist but form a dual architecture he calls the “viewer society,” in which modern social control operates simultaneously through both directions of the gaze—the few watching the many, and the many watching the few. Populations are governed both through what they watch and through the knowledge that they are watched. In Mathiesen’s formulation, this synoptic structure is organized through mass media, television above all, where the many are addressed as passive recipients of centrally produced images of the few. The synopticon has proven generative in digital media research precisely because social media and streaming platforms intensify this structure: being watched can generate capital, influence, and audience, and so visibility becomes a resource. Yet Mathiesen’s framework, as he formulated it, still assumes a largely passive audience. The many watch, but they do not intervene. This is the structural gap that IRL live streaming opens.
IRL streaming extends the synoptic configuration: the audience’s gaze is not only extensive but intermittently actionable. I use the term “interactive synopticon” to describe a setting in which streamers are continuously visible to the many, while the many can periodically shape what unfolds through live interaction, monetization tools, and feedback. The baseline condition remains synoptic: attention concentrated on a few visible performers, with platform mechanics amplifying their reach. But what IRL changes is what visibility does within that synoptic structure. It converts spectatorship into intermittent intervention capacity and monetized, real-time evaluation. The interactive synopticon names this coupling of mass visibility with affordances that make reaction actionable (chat/TTS/donations) and with feedback systems that render escalation legible as instant gain.
The interactive synopticon is therefore a mid-level concept designed to explain when and how participatory visibility becomes an actionable incentive environment in public space, while remaining compatible with accounts of partial seeing and infrastructural control (Deleuze, 1992; Latour, 2005). In the interactive synopticon, obscurity itself becomes consequential. When viewership dips, chat slows, or donations stall, that drop registers as an immediate penalty, and streamers are incentivized to do something to recover attention. Where Bucher (2018) emphasizes optimization under the threat of invisibility, the interactive synopticon specifies how audience intervention capacity and monetized real-time feedback turn that pressure into a steerable escalation directed outward into offline settings.
Crowds, co-performance, and reaction as infrastructure
The preceding sections established the streamer’s orientation to metrified visibility and the platform architecture that amplifies and monetizes it. What they leave underspecified is the third structural element, the interactive public. In synoptic and platform-visibility research alike, audiences tend to appear as undifferentiated metric-sources whose role ends at attention. In deviant IRL, participant-viewers are structural inputs, supplying dares, scripts, financing, and continuous moral recoding that make escalation possible, direct it, and sustain it in real time.
Prior accounts of “performance crime” note that some illicit or transgressive acts are staged for visibility, with attention functioning as a reward (Surette, 2015; Yar, 2012). Deviant IRL live streaming sharpens this dynamic by collapsing the distance between act and audience: participant-viewers witness disruption as it unfolds and can actively coax, finance, or script the next moves through chat and paid TTS. The analytic shift is therefore from deviance performed for an imagined public to deviance co-produced with an interactive public in real time.
Classic crowd theory portrayed crowds as irrational and contagious (Le Bon, 1895), a view since critiqued for its racialized and gendered anxieties about mass society. Later work emphasizes collective behavior as norm-governed and context-sensitive. Thompson’s (1971) “moral economy” highlights shared expectations that structure crowd action, while Tarde (1903) foregrounds imitation and the circulation of affects and ideas. Contemporary work on digitally networked collectives describes “affective publics” organized around intensities rather than formal membership (Papacharissi, 2014), with dynamics shaped by anonymity, distance, and group identification (Reicher et al., 1995; Brighenti, 2010).
Two readings of audience participation in mediated culture sit in tension here. Jenkins’ (2006) participatory culture framework describes audiences as creative co-producers whose engagement empowers them and democratizes media production. Critical political-economy work (Andrejevic, 2002, 2004; Terranova, 2000; Postigo, 2016) reads the same activity as extraction: unpaid or low-paid creative input captured as platform value, often under language of community that obscures exploitation. I take from Jenkins the descriptive point that audiences in convergent media are co-producers rather than passive recipients, but read deviant IRL through the political-economy lens. The features Jenkins celebrates (participation, co-creation, audience agency) are what produce escalation, harm to bystanders, and revenue for streamers and platforms. What participant-viewers “produce” with their paid prompts is, at its extreme, public norm violation that others must absorb. In the interactive live stream, these dynamics are folded into a co-performance where chat and paying participants continuously supply input that streamers may incorporate. The analytic claim is not that participant-viewers merely “influence” performers, but that platform affordances make their participation structurally consequential.
Deviance, labeling, and the interactionist tradition
This article uses deviance as a working analytical category, valuable for what it directs attention toward—social reaction as the constitutive process—rather than as a settled theoretical claim. Becker’s (1963) argument that deviance is not a property of the act, but the product of social reaction and labeling, is well suited to the empirical object here, where deviant acts are staged for participant-viewers who define and reward them in real time. The question is not whether a given act is inherently transgressive but how labeling operates across the multiple publics (the stream’s subculture, bystanders, wider publics, and institutions) that assign, contest, and enforce deviant designations.
This interactionist tradition has been substantially extended since Becker. Lemert (1951) drew a temporal distinction between primary deviance (the initial act) and secondary deviance (the reorganization of identity and conduct around an internalized deviant label). This maps directly onto both cases in this article: having publicly embraced the identity of a streamer who does transgressive IRL content, streamers may treat further escalation as role-consistent behavior, making the label itself a driver of continuation rather than a deterrent. Work on moral panic (Cohen, 1972) and deviancy amplification (Wilkins, 1964) shows how media and institutional reaction to deviance can intensify rather than suppress it, producing more of the behavior they condemn through escalating coverage and punitive attention. This dynamic is visible in both cases: circulating clips and press coverage brought wider audiences and amplified notoriety before enforcement arrived.
This tradition has also drawn substantive critique. Scholars in critical and Marxist criminology have argued that labeling theory underspecifies the structural conditions (class, race, gender) that organize who gets labeled, by whom, and with what consequences (Garland, 2001). Goode and Ben-Yehuda (1994) show how deviance labels can be instrumentalized by media and political actors, producing institutional responses disproportionate to the original harm. Wacquant (2009) demonstrates how punitive governance operates through racial and spatial sorting, extending the insight that deviance labels are instruments of power rather than neutral social judgments. Most directly relevant here, Gray (2014), in her study of Race, Gender, and Deviance in Xbox Live, shows that digital environments reproduce and intensify these asymmetries, with Black, women, and queer users facing punitive visibility and sanctioned harassment for behaviors that white male users perform with relative impunity.
These critiques specify the limits of the framework rather than undermine it. The demarcation this article draws—between deviance as staged norm-violation and deviance as the punitive over-surveillance of non-normative identity—is an analytical scope condition, not a normative adjudication. It identifies what this framework is designed to explain (how deliberate, monetized norm-violation is co-produced and sanctioned on platforms) without claiming that the second category is less important or that the normative dispute over what “counts” as deviance is settled. That dispute is real and consequential (Garland, 2001; Gray, 2014; Wacquant, 2009). The demarcation prevents conflating two analytically distinct mechanisms: staged performance of transgression for profit, and punitive over-surveillance of non-normative identity. These operate through different logics and require different governance responses. Deviance ascriptions in “deviant IRL” are also not limited to identity-based categories; they involve social position (Western male outsider operating in non-Western public space), platform capital (large follower counts conferring informal legitimacy), and situational status (streamer as focal performer with audience backing). For working purposes in this article, “sanctions” refers to reputational, platform, or legal punishment (demonetization, suspension/ban, arrest, prosecution); “bystanders” are co-present members of the public who are filmed; and “wider publics” include journalists, commentators, and regulatory and legal institutions beyond the stream.
Deviant IRL streaming extends a longer tradition of mediated spectacle in which shock and norm-breaking are mobilized for entertainment, from stunt and prank television (e.g., Jackass, Punk’d) to early YouTube challenge cultures (Hobbs and Grafe, 2015; Jarrar et al., 2020). What it removes is the post-production buffer: there is no gatekeeping, audiences shape events as they unfold, and liveness, interactivity, and rapid circulation collapse into a single real-time event. This resonates partially with Debord’s (1994) account of the spectacle, in that social relations under platform capitalism are mediated through commodified images that displace lived experience. But Debord’s spectacle presupposes a passive spectator confronting a pre-produced image, whereas in deviant IRL, viewers are co-producers of the event as it unfolds through chat, donations, paid TTS, and clipping. I therefore invoke Debord only to name the commodification of social experience and depart from him on spectator agency, a departure I treat as constitutive of the interactive synopticon.
Related dynamics appear in other subgenres. Work on “trash streaming” shows participant-viewers paying to watch increasingly degrading or risky performances (e.g., eating inedible foods on command and remaining live for prolonged sleep-deprivation endurance challenges), illustrating how norm-breaking can be monetized through attention and donations (Bek and Popiołek, 2019; Jones, 2020; Wojewoda and Bergcholc, 2025). Deviant IRL differs in that disruptions are often directed outward at bystanders in public space, but the underlying logic of provocation as attention technology is comparable.
The labeling dynamics described above are intensified by the platform economics that make deviant IRL lucrative. Engagement is not only measured; it becomes an input to distribution and revenue (Davies, 2023, 2024). Recommendation systems amplify content already generating strong reactions, producing feedback dynamics that favor sensational material (Bishop, 2019; Bucher, 2018; Gillespie, 2018; Massanari, 2017). Streamers earn through subscriptions, donations, and advertising that scale with audience size and intensity, making Andrejevic’s (2002, 2004) “work of being watched” unusually quantified: maintaining attention is labor, and metrics become its wage signals. Through donations and TTS, participant-viewers can pay to inject prompts and provocations, making their input not just value-producing but a revenue-generating driver of escalation (Terranova, 2000; Postigo, 2016).
Data and methods
This article asks: under what conditions does participatory visibility convert being watched from a mechanism of discipline into an incentive for public norm violation? To examine this, I conduct a live stream content analysis of two streamers who became internationally notorious between 2023 and 2025 for broadcasting disruptive, norm-breaking behavior in public.
Case selection
Vitaly Zdorovetskiy and “Johnny Somali” (Ramsey Ismael) are the two cases selected for this analysis as extreme instances (Flyvbjerg, 2006) in which attention, interaction, and sanction are unusually observable. Vitaly is a Russian-born, American-based YouTuber with over 10 million subscribers across platforms (Social Blade, 2026), who in 2023 shifted to Kick for IRL streams increasingly centered on public disruption: fake thefts, harassment, escalating stunts. In early 2025, while streaming in the Philippines, he was arrested by Philippine authorities and faced multiple criminal charges following incidents including fleeing with a security guard’s hat and threatening to rob an elderly woman. Johnny Somali is an Ethiopian-American streamer notorious for provocations in Asia. After being banned from Twitch, he migrated across platforms (Kick, Rumble, and Parti). His content routinely antagonized Japanese and Korean bystanders, including trespassing on construction sites while yelling “Fukushima!” and playing loud North Korean military music in South Korea (Korea JoongAng Daily, 2024), and he was later arrested and indicted in South Korea (Muzaffar, 2024).
Both cases meet notoriety criteria: multi-country news coverage, documented platform enforcement, viral clip circulation beyond the live audience, and on-screen viewer counters repeatedly showing counts in the thousands. I treat them as critical cases for three reasons. First, both performed conspicuous norm violations across multiple cultural contexts, prompting international backlash. Second, escalation was tightly coupled to real-time participation: chat (often via TTS and donations) proposed and financed stunts, while both streamers visibly monitored viewership and monetization cues during episodes of disruption. Third, each case culminated in serious offline consequences (arrests and criminal charges) alongside permanent platform bans, enabling analysis of the temporal relationship between live performance, circulation, and eventual intervention. Both streamed abroad while benefiting from Western male mobility and outsider status, which initially afforded a degree of impunity. Both were American-based (Johnny is a U.S. citizen; Vitaly is a Russian citizen and U.S. permanent resident). While extreme, their performance styles also resemble milder boundary-pushing in mainstream creator cultures where controversy is courted for engagement (e.g., Kai Cenat, Jack Doherty, JiDion, Ice Poseidon), making them stress tests for how interactive audiences and platform incentives can drive deviance.
Data collection
The primary dataset comprises approximately 42 hours of archived live stream video and corresponding chat logs (2023–2025). Because the streamers’ channels were frequently banned or deleted, I sourced many recordings from public archives and third-party reuploads (e.g., YouTube and Reddit). Archived live streams provide leverage because they preserve interactional sequencing: how prompts, metric shifts, and decisions occur in short intervals before later clipping, commentary, or journalistic reframing. The recordings contain visible and audible traces of audience participation (chat, donation alerts, TTS), metrified engagement cues (concurrent viewers, chat velocity, trending/front-page talk), and unfolding public encounters. This enables analysis of how attention, interaction, and escalation are coordinated in real time. I focused on several widely circulated “high-drama” segments (e.g., the stream preceding Vitaly’s arrest in the Philippines and the stream of Johnny’s memorial desecration in Seoul) because they offer especially dense evidence; the analytic claims rest on repeated patterns across the corpus. To verify events, I gathered 58 secondary sources, including news coverage (English-language and local outlets), official statements and press releases (e.g., from the Philippine Bureau of Immigration), and platform communications. I also reviewed platform policy documents (community guidelines, terms of service) across relevant services. The corpus additionally includes a small number of segments from Johnny’s streams in Israel, drawn on selectively as supplementary illustration. All data were publicly available; no private or sensitive personal data was accessed.
Analytical approach
This is a theory-building article conducted through what I term live stream content analysis, an approach that combines close sequential reading of archived video and chat logs with secondary source triangulation, specifically designed to capture the temporal unfolding of platform-mediated interaction. This differs from conventional content analysis of static or edited media, because the object is a temporally sequenced interaction (prompts, metric shifts, conduct, and reactions) rather than a stabilized text. Live stream content analysis preserves that sequencing, making feedback loops observable as processes and capturing how meaning, legitimacy, and accountability are negotiated in situ before being later clipped and reframed. As a methodological contribution, this approach may be applicable to other contexts where temporality and real-time audience response are constitutive of the phenomenon under study.
I used an abductive thematic analysis (Braun and Clarke, 2006), moving iteratively between the theoretical concepts developed in the framework and observations from the corpus. Using open and focused coding (Charmaz, 2006) in NVivo, I marked each instance of deviant or disruptive behavior and its immediate context: what preceded it, what audiences were doing (daring, praising, expressing concern, intensifying engagement, using TTS), and how the streamer responded. I also coded moments when streamers explicitly referenced motivations or metrics (e.g., “for the content,” citing viewer counts), treating these as evidence of how platform feedback was interpreted and acted upon. I wrote chronological process-tracing memos for each major incident and triangulated patterns against secondary reporting and platform documents to verify timelines and consequences. Coding stabilized; no new themes emerged from additional segments, after approximately 42 hours of footage.
To avoid a deterministic account, I also coded moments of non-escalation: episodes in which provocations produced little metric movement, participant-viewers urged escalation, but the streamer refused, and co-present constraints forced de-escalation or stream termination. These negative cases clarify the boundary conditions of the mechanisms identified next. In the “Findings” section, bracketed tags such as “[South Korea, YouTube, 2024]” refer to archived stream segments in the corpus, indicating geographic location, platform, and year of recording. The analytic claims rest on pattern consistency across multiple segments in both cases rather than any single viral clip.
Findings
Mechanism 1: Live audience steering
Across both cases, the audience is not a background public that merely witnesses deviance. Some viewers function as participants, supplying prompts, scripts, and provocations that can be taken up live and reorganize encounters around a second, digitally mediated audience. I call this live audience steering. Steering is consequential because it is operational input mid-encounter, delivered through chat and, most consequentially, paid TTS broadcast into physical space.
Steering typically follows a compact sequence: participant-viewers propose a next move (dare, instruction, provocation), the streamer treats it as actionable, and participant-viewers evaluate the outcome in real time, marking what “worked” and what should escalate next. Steering does not require strict obedience; it also operates when refusal is converted into an alternative transgression that preserves alignment with the audience’s preferred trajectory.
A convenience-store segment from Johnny Somali’s South Korea streams shows this clearly [South Korea, YouTube, 2024]. A US$3 TTS dare tells him to pour hot ramen soup on himself to “kill [his] sperm cells.” Johnny refuses the specific self-directed harm, yet treats the prompt as a demand for spectacle and converts it into a disruption in the store, dumping ramen onto the counter and provoking conflict with staff and bystanders. The analytic point is not “the interactive audience told him to do X, and he did X,” but the general dynamic: a prompt arrives, the streamer treats it as a cue that escalation is expected, and the encounter is reshaped accordingly.
The same segment shows why TTS is not merely “chat” but an intervention into offline space. A subsequent TTS donation, broadcast aloud in Korean, falsely portrays the streamer as a sexual threat to minors [South Korea, YouTube, 2024]. Rendered as audible speech in the store, the provocation becomes part of the shared interactional environment: it alters what bystanders can reasonably infer, heightens threat, and increases the likelihood of confrontation even if the streamer later disavows authorship. Here, the interactive audience is not only influencing the streamer’s intentions; it is supplying new interactional objects (audible accusations, insults, dares) that reorganize the situation for everyone present.
Steering also builds in responsibility diffusion, which further incentivizes taking up risky prompts. In some streams, steering is structured as a vote. For example, Johnny polls chat on whether to prank-call the police, and when replies fill with constant “1” (yes) and a US$5 TTS message reinforces the request, he treats it as approval and dials the emergency number [South Korea, Parti, 2024]. In another setting [South Korea, YouTube, 2024], a US$3 TTS broadcasts a violent threat toward bus passengers, while a US$10 donation triggers looping baby-cry audio that visibly upsets riders. When police confront him afterward, Johnny deflects blame by insisting he had “no control” over what TTS played and that it was the audience’s fault. Live audience steering thus distributes agency in two directions at once: it expands participant-viewer agency, enabling remote participant-viewers to script and inject provocations into public settings, while giving the streamer a readymade account that dilutes responsibility (“it wasn’t me, it was TTS”). This is one way the interactive synopticon differs from older accounts of synoptic visibility. The many do not simply watch the few, they gain intermittent control capacity over what the few do (amplified by distance and insulation), with the streamer acting as a hinge that converts remote daring into offline disruption.
Steering is sustained not only through explicit dares but through continuous moral recoding; the evaluative channel that makes direction effective. Across the corpus, participant-viewers rate escalation as success and restraint as failure. Messages like “LOL,” “I’m crying [of laughter],” and “W” (“win”) function as evaluative cues defining the “right” move in the moment. Under liveness, the streamer navigates two normative worlds simultaneously: the local moral order (where disruption invites sanction) and the stream’s evaluative order (where disruption can be rewarded as “content”).
Johnny’s streams show moral recoding operating through culturally specific taboo-breaking. In South Korea, the stream treats “Statue of Peace” memorials (depicting a young woman; honoring victims of wartime sexual slavery) as props. He climbs on the statue, kisses it, and performs sexualized gestures until locals intervene [South Korea, Rumble/YouTube, 2024]. The mechanism hinges not only on bystanders reading this as desecration, but on chat recoding it as achievement: cheering, encouraging him to “oil [the statue] up,” spamming “W,” and elevating him as “the King of IRL.” In Japan, the same logic appears when Johnny invokes the atomic bombings on a crowded train—“Hiroshima! Nagasaki! [America will] do it again”—calibrated to horrify commuters while many audience members treat the public disturbance as proof the performance is “working” [Japan, Kick, 2023]. By treating outrage and discomfort as “wins,” participant-viewers define escalation as competent execution rather than error, making further provocation, in these streams, the interactionally appropriate next move.
Conceptually, live audience steering advances a stronger account of participation than models centered on interpretation and circulation. In deviant IRL, participation becomes operational input that can redirect offline encounters in real time. Paid prompts are not just value-generating labor but purchases of priority and escalation that convert reaction into both revenue and direction (Postigo, 2016; Terranova, 2000).
Mechanism 2: Reinforcement-amplification loop
If live audience steering names the audience’s directional capacity, this mechanism names the feedback loop through which escalation is rewarded in real time and those rewards intensify the conditions that make further escalation attractive. In these streams, visibility is experienced less as ambient observation and more as (a) a continuously updated scoreboard—the live, on-screen display of concurrent viewer counts, chat velocity, and donation/subscription cues that the streamer monitors in real time—and (b) an amplifier (signals that the stream is being surfaced more widely, “front page/trending,” followed by newcomer influx and rapid clipping). Together, these features reorganize visibility’s disciplinary force. Rather than inducing restraint, visibility becomes a resource that can be actively produced, monitored, and defended through further disruption.
Across the dataset, the loop has a recurrent but contingent shape. Streamers narrate metrics (“we’re at X”), frame higher numbers as performance goals (“let’s hit Y”), and escalate to generate a spike. Crucially, the scoreboard rewards intensity without valence. It registers more viewers, faster chat, and more donation alerts, without distinguishing between admiration and outrage. Moral difference is flattened into a single axis (upward movement), so backlash can become usable reinforcement when it increases attention density.
Johnny Somali’s streams illustrate this “intensity without valence” logic in compressed form. In one South Korea sequence, about 1,580 people watched him chatting idly at 8:35 PM (KST), as displayed in the archived recording; by 9:05 PM (KST), after he shifted into a provocative public performance (twerking), concurrent viewership climbed to roughly 2,700 [South Korea, Parti, 2024]. Shock and restraint-talk (e.g., participant-viewers commenting “please go to bed,” “don’t actually harass [that worker]”) still increased view counts and chat velocity, feeding the same upward movement as supportive hype. Across other streams, he uses the same reaction-bait logic: trespassing on Japanese construction sites while yelling “Fukushima!” and blasting North Korean military music on South Korean buses [Twitch, 2023]. These stunts were calibrated to elicit confrontation and keep attention dense. At times, he states the conversion directly. After a local punched him on stream, he smirked and told his audience, “Punch me, assault me [. . .] it’s content” [Kick, 2024] and “we’re going viral . . . it’s entertainment” [YouTube, 2024]. During the Hiroshima/Nagasaki subway incident, he explicitly frames outrage as profit: “I’m at a thousand subs right now . . . I’m making money from this” [Japan, Kick, 2023].
In the reaction economy (Davies, 2023, 2024), reaction names a causal chain through which affect becomes consequential under platform conditions. Intensities of response are (a) expressed through platform channels (chat, TTS, donations, clipping, reposts), (b) captured as engagement metrics, (c) converted into distribution and revenue through ranking/recommendation systems, and then (d) returned to creators as incentives to repeat or intensify conduct that generates high-arousal feedback. Reaction is the relay that translates affect into measurable performance signals and then into material rewards, making “what gets a rise” an actionable guide for what happens next.
Vitaly’s Philippines streams show the same loop with explicit narration and visible scaling cues [Philippines, Kick, 2025]. When told he has roughly 2,400 concurrent viewers, he announces: “Let me show you how we get to 3K.” He immediately escalates through a riskier act (running into traffic and taking an older man’s bicycle), staging a high-arousal disruption to produce a spike. Afterward, he performs brief “repair work” (returning the bike, joking apology, cash tip), but the repair does not restore a stable interaction order; it functions as a bridge back to the metric. He checks and narrates the payoff: “2.5 thousand now? 2.6k!. . . This is what I wanted . . . This is content!” He then reorients to the next threshold, “Let’s break 5k, chat!”, converting audience presence into a serial numeric project even as offline risk accumulates. As numbers climb, he chains further public disruptions, grabbing a security guard’s hat and running, jumping onto a moving jeepney, and later threatening to rob an elderly woman, each treated as a bid to sustain momentum [Philippines, Kick, 2025]. It is also worth noting that Vitaly, like many large-scale streamers, operates with a small production team, including at least one cameraman who tracks and narrates viewer counts at key moments in the archived footage. This underscores that the “individual bad actor” framing is doubly inadequate. Indeed, escalation is co-produced not only by remote audiences but by the immediate production context itself.
Amplification becomes interactionally visible through placement cues and newcomer influx. During a high-intensity segment, Vitaly’s cameraman announces: “You’re at 3,293 [viewers]. . . still trending on the front page” [Philippines, Kick, 2025]. Almost immediately, the chat displays an influx of newcomers (“I’m new here, what’s going on?”) alongside a faster churn of reactions. The loop persists in part because repair and recovery are folded into the frontstage performance itself, whereby the encounter remains continuously audience-oriented, so contrition and de-escalation are performed under the same evaluative gaze as the transgression that preceded them (Goffman, 1959, 1967). Johnny’s apology sequences operate on the same logic. After Japanese YouTubers confronted him over the Hiroshima/Nagasaki remarks, he apologized under pressure but in an exaggerated accent, rendering contrition interactionally ambiguous [Japan, Kick, 2023]. In both cases, repair functions less as withdrawal than as another scene within the stream. Region collapse can also extend beyond the stream. Events that would ordinarily pull a performer backstage (arrests, press briefings, or court appearances) can be folded into the ongoing narrative and managed under audience scrutiny, so sanction becomes another performance context rather than a clear interruption (Goffman, 1959, 1967). In South Korea, Johnny arrived at a Seoul court hearing wearing a red “Make America Great Again” hat, an extension of the provocateur persona he cultivates on stream. When required by courtroom rules to remove it, the legal proceeding itself became a stage for performed defiance.
Amplification also extends beyond the live moment through clip migration, giving high-arousal episodes a portable afterlife. Secondary reporting on both cases drew heavily on circulating clips to narrate events, expand awareness beyond platform subcultures, and frame conduct as a public issue. Taken together, the reinforcement-amplification loop specifies how participatory visibility can operate as an incentive rather than a deterrent. Escalation produces metrified rewards in real time, those rewards are treated as evaluative truth, and visible signs of widened circulation intensify the conditions that make escalation likely to continue.
Mechanism 3: Lagged accountability
The third mechanism explains why escalation can persist long enough to compound, and why it often ends abruptly. I term this dynamic lagged accountability. Rewards arrive immediately (attention, donations, notoriety), while sanctions typically arrive later (platform enforcement, arrest, immigration action, prosecution). Liveness produces a temporal asymmetry: the stream’s internal reward system runs on a fast clock, whereas institutions and platforms capable of imposing constraints tend to move on a slower one. At the same time, the visibility that rewards deviance creates an evidentiary afterlife, as streams and circulating clips become durable records that can later be reframed and mobilized to justify intervention.
Empirically, this mechanism is visible in three recurring signatures: (a) reward now, sanction later—spikes in engagement and donations occur with little immediate constraint; (b) enforcement triggered by external pressure—formal responses follow circulation, backlash, and reputational risk rather than the first violation; and (c) self-documentation becomes evidence—live streams and clips provide records used to reconstruct events and support intervention.
Vitaly’s Philippines episode illustrates this timing. During the live streams, disruption was rewarded through rising engagement and participant-viewer encouragement [Philippines, Kick, 2025]. Only after clips circulated widely and backlash intensified did institutional actors intervene. The Philippine Bureau of Immigration (BI) operatives arrested Vitaly and publicly designated him an “undesirable foreign national” in connection with the viral incidents (2025). In subsequent statements, the BI (Bureau of Immigration Philippines (BI), 2025) argued that the Philippines would not allow foreign guests to “abuse our hospitality and disrespect our people and our country,” exemplifying Durkheimian boundary-work: sanction operates as a public reaffirmation of norms once violations become widely visible. Platform containment also hardened: Kick banned Vitaly following his arrest (Complex, 2025), and the ban was later lifted after his deportation in 2026 (BI, 2026). During a Department of the Interior and Local Government press conference following his arrest, officials screened excerpts from Vitaly’s live stream while he appeared in handcuffs, reframing his self-documentation as evidence of wrongdoing (ABS-CBN News, 2025). Footage originally produced for audience engagement thus became part of the material underpinning the charges filed against him.
This is what lagged accountability adds to the model. Deviant IRL persists not because surveillance is absent, but because the operative “surveillance” in the moment is participatory and metrified (structured around chat feedback, donations, and visible performance signals), while regimes capable of imposing meaningful constraint operate on a slower clock. The lag creates room for steering, reinforcement, and amplification to compound. Only later, once clips travel and wider publics and institutions intervene, does visibility flip from resource to liability.
Not everyone could have “gotten away with” as much for as long. Both streamers operated as American-based foreigners in Asia, benefiting from the relative impunity Western mobility and citizenship can sometimes confer. Vitaly, a white man, remarks on this explicitly in Manila, saying police were “too nice” and implying he would not have had the same “easy” experience making content in Russia or the United States [Philippines, Kick, 2025]. Johnny, who is Black/Ethiopian-American, also brandishes U.S. nationality as a protective resource when confronted; when Israeli police arrest him for live-streamed harassment of an officer, he shouts “America! I’m from America. America, U.S.A!” and later waves his emergency U.S. passport on camera [Israel, Kick, 2024]. Research on Black, women, and queer streamers nonetheless shows that visibility is often treated as riskier and more readily sanctionable for marginalized creators in Western contexts, with less tolerance for boundary-pushing and faster punitive responses (Gray, 2014; Ruberg et al., 2019). The interactive synopticon is therefore stratified across multiple axes. Nationality, platform capital, gender, location, and race shape who is buffered, who is exposed, and how long the accountability lag can last.
Conclusion
This article’s central puzzle was, if surveillance is commonly expected to enforce conformity, under what conditions does visibility instead encourage deviance? The argument is that IRL live streaming can produce an interactive synopticon—a configuration in which the many watch the few while also intervening through real-time interaction, monetized feedback, and metrified evaluation. In this setting, visibility can shift from deterrent to accelerant.
Three interlocking mechanisms produce this inversion. (a) Live audience steering captures how remote participant-viewers supply prompts and provocations that can be taken up mid-encounter; most consequentially, when paid TTS is broadcast into physical space, inserting remote speech into co-present interaction and reorganizing what bystanders must respond to. (b) The reinforcement-amplification loop explains why escalation becomes worth doing and hard to exit: streamers orient to a continuously updated scoreboard (on-screen viewer counts, chat velocity, donation/subscription cues) and treat upward movement as validation, while visible signs of widened circulation expand the audience and raise the incentives to sustain momentum. Because these signals register intensity without moral valence, admiration and outrage alike can function as reinforcement when they keep attention dense and the number rises. (c) Lagged accountability explains why the cycle can compound before it breaks: rewards arrive immediately, while sanctions arrive later, often only after circulation and backlash expand beyond the stream’s subculture. The same visibility that incentivizes escalation in the live moment also produces durable records that can later be mobilized as evidence when visibility flips from resource to liability. The interactive synopticon is therefore a probabilistic incentive environment. It increases the likelihood of escalation under specific configurations of audience, affordances, and setting, but does not uniformly transform IRL streams into deviant spectacle. This boundary condition is confirmed by the negative cases in the analysis, in which provocations produced little metric movement, participant-viewers urged escalation but the streamer deflected, or co-present constraints forced de-escalation before the loop could compound.
The interactive synopticon is also a machine for the commodification of disruption. Debord’s (1994) account of the spectacle argued that social relations under advanced capitalism are mediated through commodified images that displace lived experience; in IRL streaming, the process is more intimate and more interactive. Spectators do not simply consume a pre-produced spectacle; through donations, TTS, and clipping, they participate in producing it, making them co-investors and co-producers of the spectacle they consume. Reactions (laughter, outrage, shock) are simultaneously the raw material and the revenue stream. This commodification dimension matters for governance: platform revenue models treat engagement as value in itself, regardless of how it is produced, structurally incentivizing the conversion of harm into entertainment.
Several governance questions follow directly from the analysis. What duties do platforms owe to bystanders captured in monetized public-space streams without consent? When paid TTS injects threats and accusations into physical space, what moderation (friction, delay, throttling, pre-review) is required, and what liability follows from treating it as “user speech” instead of a platform-delivered intervention? How should responsibility be allocated across streamer choice, paid participant-viewer prompting, and algorithmic surfacing when harms are co-produced?
The analysis reframes responsibility across three co-producers of escalation: (a) performers oriented to engagement signals as evaluative truth; (b) interactive audiences with intermittent intervention capacity, including paid speech; and (c) distribution and monetization systems that reward intensity regardless of valence. The involvement of production teams, including camera crew who track and narrate metrics in real time, further complicates individual attribution. Platforms may need to redesign affordances that facilitate escalation, including stronger friction on abusive TTS and revisiting ranking systems that treat engagement as value in itself even when produced through coercive disruption (Gorwa et al., 2020). Platform governance should avoid treating all “deviance” symmetrically, distinguishing harassment/coercion from non-normative self-presentation in both policy and enforcement design.
The interactive synopticon is also unevenly distributed (across nationality, race, gender, location, and platform capital) in ways that matter for both the mechanics of escalation and the politics of accountability. Deviant IRL streaming is enabled by cross-border mobility. Creators move through public spaces where they are legible as outsiders, while local bystanders bear the immediate costs of disruption and remote audiences and platforms capture value through attention and monetization. This fits accounts of digital colonialism in which lived environments are rendered into monetizable spectacle without meaningful consent (Couldry and Mejias, 2018; Kwet, 2019). Platform asymmetries reinforce these one-directional flows: rewards accrue to streamers and platform ecosystems, while harms are offloaded onto local publics and service workers (Jin, 2013). The selection of non-Western, Asian contexts in this study, namely, Japan, South Korea, and the Philippines, is not incidental. These settings appear in the data as sites structurally characterized by precarious institutional accountability relative to foreign nationals with Western platform capital, a condition that extended the accountability lag and allowed harm to compound before enforcement arrived. This is a finding about the coloniality of the interactive synopticon’s infrastructure. The capacity to externalize disruption onto non-Western or Global South publics while accruing value in Global North platform economies reflects a political economy of extraction that goes beyond individual misconduct.
A second, interactional layer reinforces this structure. Across the corpus, the most reliably “reactive” moments draw on racialized, nationalistic, misogynistic, or sexually coercive provocations calibrated to generate high-arousal responses that clip and travel. These repertoires connect directly to the steering and amplification mechanisms documented in the findings, and they are available precisely because international mobility places these streamers in contexts where such provocations land with maximum force. Future research should examine whether similar incentive structures operate for non-Western streamers in Western publics, and whether marginalized creators’ experiences of punitive visibility intersect with national and platform enforcement regimes differently (Gray, 2014; Ruberg et al., 2019). Related work could pair stream analysis with interviews or surveys to examine what motivates escalation, when audiences shift from encouragement to demands for intervention, and how adjacent practices (swatting, bomb threats via TTS, stream-sniping) further blur lines between audience, streamers, and bystanders.
The broader implication is that platform visibility is not a neutral exposure condition but a shaped and leveraged force. Depending on its interactive and economic configuration, being seen can constrain conduct or intensify deviance. The challenge is to design and govern participatory visibility so that attention supports accountability and constructive creativity rather than harmful spectacle.
Footnotes
Acknowledgements
I sincerely thank Ayat Salih, Halima Aboubakar, Marie-Aminata Peron, and Tomiris Frants for their conversations and insights throughout the research and/or writing process. I am also grateful to Dr. Kristin Plys for encouraging me to pursue the ideas I find most compelling. Thanks also go to the anonymous reviewers and the editorial team for their thoughtful engagement and feedback with my work.
Ethical considerations
This research did not involve interaction with human participants or any private data. All data analyzed (live stream videos, chats, and media reports) were publicly available. No personal identifying information beyond public behavior is disclosed, and no ethical approval or consent was required.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interest
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
The data for this study consist of publicly accessible live stream recordings, chat logs, and news documentation. Because these materials include offensive content and identifiable bystanders, the raw data are not provided in a repository. Key incidents are described in the article (with citations to public sources) to ensure transparency and reproducibility.
