Abstract
Major open platforms, such as Facebook, Twitter, Instagram, and Tik Tok, are bombarded with postings that violate platform community standards, offend societal norms, and cause harm to individuals and groups. To manage such sites requires identification of content and behavior that is anti-social and action to remove content and sanction posters. This process is not as straightforward as it seems: what is offensive and to whom varies by individual, group, and community; what action to take depends on stated standards, community expectations, and the extent of the offense; conversations can create and sustain anti-social behavior (ASB); networks of individuals can launch coordinated attacks; and fake accounts can side-step sanctioning behavior. In meeting the challenges of moderating extreme content, two guiding questions stand out: how do we define and identify ASB online? And, given the quantity and nuances of offensive content: how do we make the best use of automation and humans in the management of offending content and ASB? To address these questions, existing studies on ASB online were reviewed, and a detailed examination was made of social media moderation practices on major media. Pros and cons of automated and human review are discussed in a framework of three layers: environment, community, and crowd. Throughout, the article adds attention to the network impact of ASB, emphasizing the way ASB builds a relation between perpetrator(s) and victim(s), and can make ASB more or less offensive.
Introduction
Anti-social behavior (ASB) in social media is an act almost as old as the internet. Its most famous start is the “Rape in Cyberspace” in Lambda MOO, as described by Julian Dibbell in 1993. Characters in the text-based MOO (multi-user dungeon object oriented) were hacked and their “bodies” came under the control of a virtual rapist (Buck, 2017; Dibbell, 1993). The details of the virtual attack are hard to read, even from nearly 30 years distance, and the emotional impact on the victims was not at all virtual.
LambdaMOO was an early online community started in 1990 and still in operation. Descriptions of settings, character portraits, and conversation were created with text. Such early, pioneering, online communities held utopian ideals for online discourse, democracy, and community—wild, open, commercial free spaces for unfettered free speech, and supportive social interaction. The classic is The Well, as described by Howard Rheingold in his 1993 book, The Virtual Community: Homesteading on the Electronic Frontier (see also Rheingold, 2012). An attack on such a space offended and challenged the utopian ideal of online community. The “Rape in Cyberspace” was a shock and an awakening to the possibilities of ASB online, and the start of discussions and procedures for curbing ASB. Moreover, it showed that textual communications online, often referred to as a “reduced cues” environments unsuited to emotional connection could indeed pack an emotional punch.
The rape in cyberspace demonstrates offense in a number of ways. The content demonstrates sexual harassment and violence against women; the hacking represents an attack on the virtual reality MOO environment; the takeover of others’ accounts represents identity theft; the content and hacking represent an attack on the LambdaMOO community; and the offender demonstrates a pattern of ASB. Beyond the community, offenses such as these affect trust in social media environments, and challenge the limits of the “free speech” heritage of the internet. The ramifications of such attacks have impact at the platform, community, and individual level, demonstrating the multi-layer approaches needed to manage such behavior.
Among the characteristics needed for communities to succeed is the ability to manage norms and sanction those violating norms. As online environments developed, so have methods of addressing ASB and the violation of norms. Within the Lambda MOO community, the cyberrape case also triggered a change in administration from “wizard only” management to community management through a petition system (Mnookin, 1996). The LambdaMOO petitions show requests for action on ASBs familiar to any contemporary social media user. Petitions included requests for “the creation of a verb allowing experienced players to ‘boot’ guests off the system for an hour if the visitors behave in inappropriate or annoying ways . . . the creation of a way for players to ban players they dislike from using their objects or visiting their rooms . . . the inclusion of a paragraph in the ‘help manners’ text stating that sexual harassment is ‘not tolerated by the LambdaMOO community’ and may result ‘in permanent expulsion’” (Mnookin, 1996, np). With respect to the virtual rape, the “petition recommended that ‘toading,’ or permanent expulsion, become the recommended punishment for confirmed virtual rapists” (Mnookin, 1996, np).
As social media has become more entrenched in daily life, these platforms are also coming to terms with the need to address offensive content and behavior. Due to their reach, much current attention focuses on the major open platforms, such as Facebook, Twitter, Instagram, and Tik Tok. As discussed further below, of concern for these sites is evaluating the billions of posts each day, determining which posts are offensive enough to be deleted before release to participants, and how to use automated and human review to manage content. Of concern for others is the sweeping reach of corporate ownership; invisibility of algorithms used for deleting content; suffering by human reviewers who view the most offensive posts; and achieving a balance between content moderation and free speech.
This article explores these different sides to the moderation of ASB, starting with the meaning and impact of ASB, and following with a look at current content moderation practices. The article draws on existing studies of ASB online and a detailed examination of current social media moderation practices to evaluate and assess the challenges for ASB management. As described below, both automated and human procedures are used to keep up with postings on social media, with three successive layers of moderation evident: at the environment level, that is, the social media platform, where automation is prominent; at the community level where moderators are at work; and at the crowd level where participants weigh in to alert the community about transgressive material and appeal material that has been classified as offensive.
While much emphasis is given across these layers to the post as a unit of analysis, ASB is found not only in the content of individual posts, but also in interaction among participants. Moreover, how offensive a post is can depend on the meaning of that post to the individual participant, their personal history with such offenses, and the exposure felt by seeing that offense online, with others. Throughout, the article adds attention to the network impact of ASB, emphasizing the way ASB builds a relation between perpetrator(s) and victim(s), and how social network conditions make ASB more or less offensive. The article suggests a number of ways that viewing ASB as a network relation contributes to understanding ASB and its impact on others.
In the following, the first section draws on the existing studies of ASB, both offline and online, to addresses the guiding question: How do we define and identify ASB online? This section looks at what ASB is, how ASB online differs from ASB offline, how ASB offends, and the social network impact of ASB. The article then turns to the pros and cons of current practice in content management as shown in the three successive layers of moderation. Overall, and at each level, the article addresses the guiding question: How do we make the best use of automation and humans in the management of offending content and ASB?
What Is ASB, and ASB Online?
In brief, anti-social acts encompass various flavors of online behavior: from trolling, hate speech, spamming, and cyberbullying to the (perhaps milder) acts of impoliteness, rudeness, incivility, offensive comments, and stereotyping (for more on studies of these acts, see Gruzd et al., 2020). On social media platforms, anti-social offenses include posting content that violates community standards. For example, the Facebook Community Standards (Meta, 2022c) list includes: Adult Nudity & Sexual Activity; Bullying & Harassment; Child Endangerment: Nudity and Physical Abuse; Child Endangerment: Sexual Exploitation; Dangerous Orgs: Organized Hate; Dangerous Orgs: Terrorism; Fake Accounts; Hate Speech; Regulated Goods: Drugs; Regulated Goods: Firearms; Spam; Suicide and Self-Injury; Violence and Incitement; Violent & Graphic Content. (Meta, 2022c)
Classifications of ASB offline provide a useful categorization of ASB based on the target of such behavior: personal, against a specific person; nuisance, against a type of group; and environmental, in physical spaces (e.g., Metropolitan Police, 2023). Similarly, online ASB can be distinguished by the intended targets of the abuse, for example, personal cyberbullying, “nuisance” hate speech, and trolling in the environmental social space (Table 1).
Types of Anti-social Behavior.
ASB: anti-social behavior.
What Is Different About ASB Online?
ASB behaviors exist offline and online, but attacks via the internet are different in a number of ways. ASB online is:
Anonymous and distant. From the anonymity that physical separation grants the attacker in relation to their victim, to anonymous and fake identities.
Ubiquitous and pervasive. Perpetrators can be ever present through every device, medium, online group, and community; generating attention to every beep of the phone, push notice, and advertising popup on phone, tablet, and computer.
Distributed global reach and influence (e.g., QAnon conspiracy theories).
Anytime, anyplace. Online intimidation can combine with offline harassment, for example, cyberbullying added to daytime bullying of teens in schools.
Contagious. Observing ASB online can carry over to face-to-face interaction (Khazan, 2022).
Persistent. The life of a posting differs by medium, but some content can remain persistently available.
How ASB Offends
The offensiveness of the ASB, whether personal, nuisance, or environmental, depends on context, and the meaning of the offense to the victim. Studies of violence find that “[f]our elements are crucial in grappling with the meanings of violence: (1) the act itself; (2) the relationship of the participants to each other; (3) the location of the act; and (4) the outcome or the resultant damage” (Stanko, 2003, p. 11). Similarly, for the online offense, the act, the relationship of perpetrator(s) to victim(s), the location in a social media community, and the impact are elements that affect the meaning of the offense. The following explores some further characteristics of and considerations for online ASB, with particular attention to the relationship between participants and the ties that bind these participants in a network predicated on the ASB acts. These characteristics show that the impact of ASB stems from more than content alone, and involves both individual and network effects.
ASB Is Relational and Creates an ASB Network
A distinctive attribute of each kind of ASB is that it is targets others, that is, it aims to create a connection between the perpetrator and others. The original attack is directed at a person, population, community, or environment. Attacks can be directed at named others: individuals (personal); a group identifiable by some common characteristic such as race, gender, nationality (nuisance); or an identifiable network or social media platform (environmental).
Each ASB act creates a connection—a social network relation and tie—between the initiating actor (the poster, creator, ASB perpetrator), their direct target(s), and the audience that witnesses the act. An ASB network is formed from the interactions among these actors, creating a set of interconnected actors tied by ASB.
Viewing ASB online as a network suggests exploring the patterns of interaction for common characteristics that indicate ASB is playing out in a community, for example, to see if perpetrators occupy central positions in the network, or what kind of ASB act results in responses. Where social media does not provide direct identification of who a post is intended for, use of common words or names in the text may serve to reveal network structures (see Gruzd, 2016). (For an introduction to social network analysis, see Scott & Carrington, 2014, 2017.)
ASB Is Intentional
ASB acts are intentional. Perpetrators intend to elicit a response from the subject of the attack, that is, from the targeted individual, defenders of the named population, or representatives of the environment. While at one end, some hackers may want to stay below the radar, at the other end, trolls thrive on getting a rise from others. They seek to make a reciprocal relationship with the victim(s) (or their supporters), and to strengthen that tie by encouraging repeated interaction.
ASB Is Emotional
ASB perpetrators intend to elicit an emotional response of anger, fear, distress, disgust in the intended target. Experience of emotion—or rather, lack of—is integral to the perpetration of ASB.Sest and March (2017) find that trolling behavior is positively associated with knowing how others will feel (cognitive empathy), but negatively associated with empathy for experiencing the feelings of others (affective empathy), that is, trolls are more able to understand how others will feel, but less able to empathize with that feeling. Furthermore, the kind of empathy at work may differ online versus offline; Seth and March report on studies that show affective empathy predicted traditional bullying, but both affective and cognitive empathy predicted cyberbullying.
ASB Generates a Personal Response
A victim’s response is emotional and responds to more than just the meaning of a text—“text” used in the literary sense of “any object that can be ‘read,’” and thus covering images, videos, and so on. Literary theory that emphasizes the transactional nature of interaction between the text and the reader, such as reader response theory (Rosenblatt, 1978), and radical change theory (Dresang, 1999; Dresang & Kotrla, 2009), stress that the meaning of a text cannot be separated from the reader and how the reader interacts with the text. Victims carry their experience into their reaction to the text. Thus, the impact of ASB can arise from both the text and attributes of the actors in the network, such as their experience of ASB, membership in a targeted group, or history with the community and network.
Emotion is further affected by who is witness to the attack. The network configuration places both the perpetrator and the victim in the company of network members. The emotional intensity of the attack can be enhanced or moderated by who witnesses the attack, and who is thought to support the victim and/or the perpetrator. For victims, the impact of being there with others has parallels with the concept of context integrity (Nissenbaum, 2011), originally applied to understanding privacy concerns. Instead of considering privacy as one rule to be applied to data, Nissenbaum argued that the response to sharing of private information depends on whom it is shared with. An individual may find it completely appropriate to share private information with family members, but not workmates, or medical information with a specialist, but not family. Thus, along with “reader response,” there is what we might call a “reader response with an audience,” that is, how a text is received by an individual is tempered by who else is seeing the text with them and seeing them see the text.
ASB Is Performance
ASB acts are performances, directed to an audience beyond the target (Kernaghan & Elwood, 2013). ASB perpetrators are performers aware of the emotions the performance will engender, and aware of the audience that will witness these attacks, whether that is the victims or fellow bullies. “Brigading” attacks involving multiple perpetrators and posts are performed for others in the brigade.
Performance Roles
A variety of names can be applied to the members of a social media network: poster and reader; poster and audience; moderator; participant; observer; lurker. But for ASB networks, the terms become perpetrator and victim, anti-social poster and target. The different names are not just language variety but instead signify different roles—social network roles—emerging in the network. These are not roles that show in organizational designations such as moderator. These emerge from patterns of interaction and response in the network. Persistent actor roles and positions can be identified by their common interaction with others. This can be different actors in relation to others, for example, different teachers to one cohort of students, or a common pattern of actors in relation to others, for example, any teacher to any set of students. Social network measures relating to roles and positions may provide a way to discover ASB perpetrators through their patterns of interaction with others rather than through the texts of their postings.
Summary
The section has emphasized the behavioral aspect of ASB, and how the impact of ASB is defined by more than content. Individual reactions are mediated by individual, group and community history of exposure, the presence of with others (context integrity), and community expectations about content and behavior. It also emphasized the social network aspects stemming from the tie built between perpetrator and victim, further identifiable by relational type, direction, intensity, and so on, and the actor roles and positions generated by ASB performance and individual, group and community reaction, for example, ASB first responder, unofficial moderator, history keeper, central or peripheral positions; and network configurations such as cliques (e.g., brigades or sets of ASB responders).
Attention to behavioral responses suggests examining the presence or absence of emotional text to identify discussion of sensitive topics. Combining this with examination of the network configuration may help show whether this is an ASB attack on a sensitive topic, or a discussion of the sensitive topic. The former may show a central actor because of multiple responses targeted to a single individual; and the latter a more equal distribution of responses to other actors.
The next section discusses the current state of content moderation by major social media platforms, showing the breadth of the problem and the pros and cons of measures taken to manage offending content.
Management and Moderation of ASB
In the context of social media, the overall term content moderation describes the process of receiving, evaluating, and decision-making about the further dissemination of contributions in open, online environments. By definition, moderation refers to managing extremes, and in social media this means removing the extremes of offensive content and/or offensive perpetrators. Content moderation exists in many environments, often associated with a human community moderator who keeps discussion on topic within a specific forum. The process takes on much larger proportions in social media platforms such as Facebook, Instagram, Twitter, and TikTok, where human moderation is insufficient to deal with the billions of postings on these platforms. Algorithms are needed to evaluate and take action on the majority of postings based on examination of content. Humans take on assigned roles for the platform as reviewers of content that has passed through the algorithm but remains ambiguous for action, and as moderators of content and conversation at the community level. Human participants voluntarily take on further evaluation of content, by notifying the platform about unwanted content, or appealing to community moderators for regulation of conversations.
The following discusses current content moderation processes around a framework of three successive layers of moderation—environment, community, and crowd—each targeting different kinds of content, with materials filtered through from one layer to the next (Table 2). While these might loosely be aligned with environment, nuisance, and personal designations of ASB, there will be much overlap in categories across the layers of moderation.
Layers of Moderation.
Automation, Human Review, and Moderation
Automated solutions support hands-off management and decision-making, using platform-defined algorithms to identify offensive content in individual posts and take action based on platform-defined rules. Human solutions support evaluation of content in context and action based on local norms. But the work is not so clearly divided. Algorithms take the brunt of the work, filtering egregious content before it is passed to human reviewers, moderators, and readers. Humans design the technology, create the algorithms, and establish the social norms of the platform; humans evaluate ambiguous content and interpret offensiveness in relation to societal, legal, business, and community contexts; and humans read, post, and participate online, practice prosocial and ASB, and sanction others online. The appropriateness and usefulness of automated versus human review varies by the layer of moderation: environment, community, and crowd. The following sections provide more detail on practices and challenges at each layer.
Environmental Moderation
Contemporary management of ASB on social media begins with environmental protections, for example, evaluation of the content of each post—by algorithm and by human reviewers—followed by censorship of undesirable content that violates platform standards (either before or after human reviewers examine edge cases), and action against repeat offenders. The automated and human review happens before posts are made public, and determines what posts should or should not be released to public (Meta, 2022b: Taking Action).
It is tempting to say that social media platforms act on behalf of readers, by removing unwanted content. But two points can be made. First, that in the United States, Section 230 of the Communication Decency Act gives these platforms a free pass for the content posted by users—so they do not have to do this. Second, that revenue is driven by advertisements that, in turn, require users who click on the links (Gary & Soltani, 2019). To be profitable the environment must make it possible for users to remain on the site by taking actions that censor and reduce exposure to controversial content. Social media censorship is, in effect, good business and good corporate social responsibility (Gillespie, 2018; Grygiel & Brown, 2019; Oremus, 2022). Indeed, it may be considered the business of social media platforms: “social media platforms promise to rise above it [the unwanted content] by offering a better experience of all this information and sociality: curated, organized, archived, and moderated” (Gillespie, 2018).
The joint use of automated and human review aims to manage both the vast quantities of data to evaluate and the edge conditions that need human examination. As described further below, the need for automated procedures has become even more necessary given negative impact on humans of reviewing posts of violent and graphic material (Roberts, 2021), with class action suits brought against social media companies because of humans suffering from post-traumatic stress disorder (PTSD; BBC, 2020).
Environmental Moderation I: Automation
The management of egregious content captures most of the attention at the social media platform level. Content is first sorted by algorithms for actions: censor, flag for human review, or release. The list of topics to take action on is extensive and covers a wide range of content that offends or does harm in relation to societal, political, legal, and/or psychological norms. For Facebook and Instagram, the Meta Transparency Center (Meta, n.d.-b) provides quarterly statistics on what kinds of content are flagged as violating community standards, and the prevalence of these violations. Table 3 shows data on Proactive Rates for detection and removal prior to public posting, and Prevalence, the estimated percentage of all views that were of violating content for each type of violation for first quarter (Q1) and fourth quarter (Q4) of 2022. The Community Standards Enforcement Report (Meta, 2022b) also includes data on Content Actioned, “the number of pieces of content (such as posts, photos, videos, or comments) or accounts we take action on for going against our standards” and how many are Appealed, Restored with Appeal, or Restored without Appeal. (For more details, see “Learn about our measurements” on Meta, 2022b, Community Standards Enforcement Report page.)
Facebook, Q1 and Q4 2022: Proactive Rate, Detection, and Removal Prior to Public Posting; and Prevalence, Expected Views Out of All Views.
Source. Data for Proactive Rates on Facebook are from August 2022 and March 2023 downloads of CSV file on the Community Standards Enforcement Report (Meta, 2022b).
Data on Prevalence are taken from “Recent Trends” pages (Meta, 2022f) for each violation on the community standards enforcement page, for example, https://transparency.fb.com/data/community-standards-enforcement/adult-nudity-and-sexual-activity/facebook/
While such content has offline equivalents, the problem for ASB online is the scope and reach of this behavior, and its deleterious effect on social media use and users. The number of cases is quite staggering. In the first quarter of 2022, action was taken on 3,551,900,000 postings of content that violated the policies of Facebook, and 3,235,206,700 in fourth quarter 2022 (the sum of data in the quarter for all categories under Content Actioned). This does not include cases for Instagram, also operated by Meta; nor does it include postings on other social media.
The major social media sites have each developed lists and technologies to maintain community ethos. In Reddit, the categories for deletion are contained in a set of rules of conduct found in the policy pages (Reddit, 2022b). As with Meta, “anti-evil” technology flags content for human review, and two in-house teams review content.
The job of policing Reddit’s most pernicious content falls primarily to three groups of employees—the community team, the trust-and-safety team, and the anti-evil team—which are sometimes described, respectively, as good cop, bad cop, and RoboCop. Community stays in touch with a cross-section of redditors, asking them for feedback and encouraging them to be on their best behavior. When this fails and redditors break the rules, trust and safety punishes them. Anti-evil, a team of back-end engineers, makes software that flags dodgy-looking content and sends that content to humans, who decide what to do about it. (Marantz, 2018)
TikTok by ByteDance Ltd also has a long list of types of posts that violate community guidelines, and the only list of the three that specifically includes animal abuse (under Violent and Graphic Content). TikTok has also recently created a transparency center with removal reports by type of violation (TikTok, 2022a). The list of monitored content includes: Minor Safety [under 18]; Dangerous acts and challenges; Suicide, self-harm, and disordered eating; Adult nudity and sexual activities; Bullying and harassment; Hateful behavior; Violent extremism; Integrity and authenticity; Illegal activities and regulated goods; Violent and graphic content [includes animal abuse]; Copyright and trademark infringement; Platform security; Ineligible for the For You Feed
TikTok uses a mix of technology and human moderation and encourages members to use the provided tools to report content they believe violates the Community Guidelines (TikTok, 2022b).
Challenges to Automated Detection
Volume, Velocity, Variety, Veracity, Value
The management of social media data is challenged by the five vs. big data: volume, velocity, variety, veracity, and value. Social media certainly qualifies as big data in terms of volume and velocity. Variety is present in the forms of content—text, image, audio, video, and combinations—and associated data structures. Veracity for social media implies judging which posts are appropriate and which are to be filtered by the automation algorithm. Prevalence of ASB is one measure of veracity of the data stream (Table 3). Value implies the benefit to the social media owners, businesses making use of social media, and the social media audience (users, participants). Challenges include creating review structures for violating and anti-social content, including determining the appropriate categories, creating algorithms, removing content, and establishing human review and appeal systems.
Internet Culture
Difficulties arise in balancing “free speech” with the censorship of ASB; managing ambiguity and uncertainty in language use; and covering variety in the use of language and image. As Roberts points out, a major challenge is balancing the tension between the tradition of the internet as a “wild, open space,” where free speech is heralded as the norm, and the realities of how social media have become more entrenched as an everyday social, learning, and work space (Roberts, as quoted in BBC, 2020).
Ambiguity and Bias
Inherent in any decision-making is the possibility of error as false positives (deleting what should not be deleted) and false negatives (approving what should not be approved). Attention to social media edge conditions has focused on the systemic mistakes that appear to silence some voices. As with many technologies, designer error can build in bias, amplifying societal prejudices, and silencing discussion of the sensitive topics that are the subject of filtering (Cobbe, 2021; Hampton et al., 2014; Sap et al., 2019).
A difficulty for automated detection is keeping up with the velocity of language evolution, resistance by groups and communities, texts associated with different genres and media (e.g., memes, parody, satire, special effects; see TikTok “effects” center) and deliberate obfuscation of meaning, for example, “[s]arcasm, irony, humour, hyperbole, metaphor, allusion, and double meanings are common . . . and difficult for natural language processing systems to navigate” (Cobbe, 2021). Where automated detection is seen as censorship, forms of “everyday resistance” are used. Cobbe (2021) lists these tactics: switching communications channels to evade internet censorship in China; using a technical means to evade social platforms restrictions by creating new versions on different devices, posting screenshots or excerpts, as done to show videos of the attack on a mosque in Christchurch, New Zealand, in March, 2019; and disguising the text by using “obscured fonts, wonky text, or backwards writing” (as done by QAnon conspiracy theorists).
Whether political, historical, or situational, context is the greatest challenge to automated methods for interpretation of acts such as hate speech and bullying. Human review helps bridge that gap for societally inappropriate behavior, and more nuanced evaluation at the community level comes down to moderators.
Censorship, and Invisible Command and Control
Concern has been raised about the effects of non-transparency of the algorithms used for censoring incoming posts (Cobbe, 2021). That the algorithms used for detection are not available is, quite legitimately, intended to prevent circumvention and thus is not a negative aspect of algorithmic decision-making. However, it raises questions about their construction: who is creating these algorithms, and hence what unconscious biases may be instantiated in design? Who is reviewing these algorithms and how often? Who decides what is censored? Is the platform in control of censorship, or are rules swayed by government control or influence, or to platform stakeholders’ biases or influence? How is privacy protected when algorithms and reviewers are reading the content not just the metadata of each post (Cobbe, 2021)? Where law has not kept up with new technologies, what corporate social responsibility do social media platforms have (Grygiel & Brown, 2019)?
There is also concern that censoring posts will create a separation of online spaces into “AI-patrolled and regulated ‘safe spaces’ separated from free-for-all zones” (Rainie et al., 2017, np). The separation seems to bridge the needs of those who prefer to hand over control to a platform, and those who prefer to have no constraints on their content. The latter may, at first glance, seem a way to maintain the freedoms of the open internet. However, this can lead to more separation of readers into their filter bubbles where they can hear and pursue ASB with a like-minded audience.
Environmental Moderation II: Human Content Reviewers
While evaluation of content at the social media platform level relies on automation, human reviewers still play a major role (Nguyen et al., 2020). In Q1 2022, Meta reported using 15,000 human reviewers worldwide, reviewing in 50 languages. Automation identifies content that fits each content classification and prioritizes review to humans with expertise in certain policy areas and regions (Meta, n.d.-a, Detecting Violations). The type of expertise includes knowing the language, subject matter, and Facebook policy (Meta, 2022e: How Review Teams Work).
Success rates show the efficacy of the two-mode evaluation. In Q4, 2022, the percentage of such material caught before posting (by AI or reviewers) was between 81.9 and 99.3, except for Bullying & Harassment at 61% (Table 3). Over the last 4 years, detection of Hate speech and Bullying, two types of content areas that are particularly challenging to identify accurately, has improved the most: in 2018, only 38% of hate speech was detected before users reported it, and in Q4 2022, 81.9%; in 2019, only 14.4% of Bullying was detected before release, and in Q4 2022 at 61% (Meta, 2022a: Challenge of Detecting Hate Speech). Data on how much of each category is managed by the technology and how much by humans were not found; nor was information of what exactly is identified as hate speech or bullying by Meta or other social media, and what is identified as such by users after posting. Both kinds of information would shed further light on how content is managed, and the possible errors of omission or commission that occur.
Beyond content, Meta also uses a “cross-check system” that aims to mitigate the risk of inappropriate removal of non-violating content. When started in 2013, reviewers examined the risk based on the “type of user or entity (e.g., an elected official, journalist, significant business partner, human rights organization) [identified from a compiled list], the number of followers, and the subject matter of the entity)”. In 2020, the evaluation of risk was updated to rely on “three factors: (1) the sensitivity of the entity, (2) the severity of the alleged violation, and (3) the severity of the potential enforcement action.” Lists are still used for the entity, but with enhanced updating and re-evaluation. (For more details, see Meta, 2022g, Reviewing High-Impact Content Accurately Via Our Cross-Check System.)
Challenges for Human Content Reviewers
Trauma and Working Conditions
The greatest challenge to human review of egregious content is the trauma associated with the exposure to harmful texts, images, videos (BBC, 2020; Roberts, 2021). Reviewers in Facebook and TikTok report suffering from PTSD as a result of their work. In May 2020, Facebook settled a class action suit for US$52 million brought by human reviewers who “alleged that reviewing violent and graphic images—sometimes of rape and suicide—for the social network had led to them developing post-traumatic stress disorder” (BBC, 2020). However, their employers may not even be the social media companies themselves; third-party companies can set conditions of long hours and quotas that further traumatize workers.
Trauma arises not just from viewing content that is removed, but there is also violent content that is not removed. Despite non-disclosure agreements, a few reviewers are speaking out about policies that leave distressing content present. Animal abuse and cruelty, for example, is not included as one of the areas that Meta’s algorithms monitor.
Shawn [Speagle] believes Facebook moderation policies should be talked about openly, because staff end up watching upsetting content that is often left untouched on the platform. As an animal lover, he was distraught that animal content “was, for the most part, never accelerated in any way shape or form,” meaning that it was never referred for removal. (Davidovic, 2019)
Although employed globally, the Facebook (Meta) agreement noted above only included US-based reviewers.
The agreement covers moderators who worked in California, Arizona, Texas and Florida from 2015 until now. Each moderator, both former and current, will receive a minimum of $1,000, as well as additional funds if they are diagnosed with PTSD or related conditions. Around 11,250 moderators are eligible for compensation. (BBC, 2020)
It is not just Meta. Two former TikTok reviewers recently initiated a class action suit against the company.
The two women behind the suit, Ashley Velez and Reece Young, both worked as TikTok moderators on contract through third-party companies, but claimed TikTok controlled the rules of their work day-to-day. They also alleged that the videos and photos they were forced to review, combined with strict schedules, consistent 12-hour days and aggressive quotas, left them traumatized. (Irwin, 2022)
Compensation and support do not alleviate the problem where “every day of work is a ‘nightmare’” (Criddle, 2021), and the content still needs to be reviewed. Even where there are not yet lawsuits, the same trauma is present for human reviewers on other platforms. A reviewer who works for Twitter’s “spam project,” whose team members are “spam moderators,” comments that “[l]ooking at pornographic and violent content for hours each day takes its toll on his team’s mental health” (Meaker, 2022). The reviewers are employed by Innodata, a New Jersey-based business with 4,000 employees around the globe. Despite the toll on mental health “Innodata does not provide mental health or counseling support to its employees, he claims” (Meaker, 2022).
Further complications arose during Covid pandemic and the ensuing requirement to work from home. In their online information both Meta and TikTok noted how Covid reduced their ability to proceed with normal practice. Meta noted that processes of appeal were unable to be handled; TikTok noted that the implementation of the transparency center was delayed. Perhaps most revealing of the nature of content that reviewers see every day is that human reviewers working from home could not safely review postings there. Automated means of review were extended to cover this loss (BBC, 2020; Dwoskin & Tiku, 2020).
Environmental Moderation III: Crowd Content Review
The third level of content review for the environment involves end users who see the outcome of earlier phases of automated and human review. These users see and appeal decisions about released content they find offensive, and decisions about their own content that has been removed. Offensive posts that are public may be appealed by any number of users. Conversely, appeals about removed content can only be submitted by the poster. Errors in either direction affect trust in the decision process and the environment, but also reveal shortcomings in algorithms and review criteria, and ambiguity in content that is difficult to assess. For example, words that identify hate speech can be the same as those used by victims to describe abuse; and widely used corpora of toxic language have been found to be twice as likely to identify African American English as offensive due to differences in dialect (Cobbe, 2021; Sap et al., 2019).
The role of end users is often overlooked in bureaucratic views of social media moderation. Yet their appeals act as another layer of defense against ASB. The statistics show how important that role is for the more nuance categories such as Bullying & Harassment. In Q4, 2022, 61% were removed before these reach the crowd, hence 39% are (presumably) identified by the crowd. Such feedback provides a measure of the efficacy of the platform level measures against ASB. Crowd data provide information on rates of false positives (content considered acceptable that is not), and of false negatives (content not considered acceptable, that is).
Challenges for Crowd Review
Mechanisms for reporting inappropriate content or behavior on social media are generally oriented to single incidents, for example, appeal a content decision, report an account or group, report an offensive post, comment, video (see, for example, Meta, 2022d: How to appeal to the Oversight Board; Facebook (n.d.), Report Content on Facebook; Reddit (2022c, n.d.), How do I report a user?; TikTok (n.d.), Report a problem). However, single incidents do not tap into coordinated actions, such as brigades. Furthermore, it appears that the crowd has no view to the restrictions on accounts (e.g., Meta (2023), Restricting accounts), and thus, no view of the full online behavior of others should they want to report an account. Such information is kept, since it is used to determine restrictions on an account, but accessible only to platform operators. Overall, the crowd is operating from a very restricted view of the whole of an account or group, and of the dynamic across the platform.
Community Moderation: Moderators and Crowd
In dealing with ASB content, community moderators perform the next layer of evaluation after the environment level evaluation by algorithm and human reviewers. Moderators judge postings by their appropriateness to the forum, for example, regarding topic, tone, humor, ways of supporting arguments, and so on. As with other reviewers, they may be able to censor postings and/or individuals when the community has a pre-approval process for posting.
However, unlike content reviewers, community moderators are embedded in the community, and their identity is visible to community members. They have a duty to the community and community members, while also maintaining their own presence as a participant in the community. When a post makes it past that moderator, the content, identity of the poster, and the moderator’s decision become visible to platform participants and an entry to the online community. Rather than anonymous one-way decisions (before appeals), this is a multi-way interaction between a named moderator(s) and community members, managed with consideration of context, actors, and actor relationships. Their interventions are seen by members of the community, as they act as arbitrators and motivators for appropriate behavior. Where content is evaluated before posting, moderators are in the unique position of seeing both published and unpublished postings, that is, both the visible and invisible parts of the community social network. Report mechanisms such as visualization of that network might help moderators in knowing whose postings to monitor most, and/or what topics to monitor.
As well as named moderators, participants in the community can be active in managing in-community discourse. Such crowd moderation appeals to the ideas of participatory democracy—crowds enacting public discussion of political and social issues toward the development of policy. Mnookin (1996) provides an eloquent description of development of participatory democracy in the case of LambdaMOO, when it transformed from “wizard only” management to community management in 1993: At its outset, LambdaMOO was an oligarchy without any formal system for resolving controversies or establishing rules. The oligarchs—MOO-founder Pavel Curtis as well as several other players who had participated in LambdaMOO since its infancy were known as “wizards”; they were responsible for both technical integrity and social control on the MOO. “The wizards were benevolent dictators . . .” In what Curtis hoped would be “the last socio-technical decision imposed on LambdaMOO by wizardly fiat,” the oligarchs instituted a petition system, a process through which the players in LambdaMOO could enact legislation for themselves. (Mnookin, 1996)
Despite the rough and tumble of online discourse, the idea of management by the crowd is still an ideal for online communities. Crowd moderation shows the community network at work, supporting, censoring, arguing, all within the accepted framework of the community. Maintaining a successful online community has been the subject of many studies, exploring social structures, motivations, and design (e.g., Gilbert, 2018; Haythornthwaite, 2007; Kraut et al., 2011; Preece, 2000; Rainie & Wellman, 2012). At the community level, both moderators and participants set the expectations and practices that define their community. The community reinforces standards by a process of structuration (Giddens, 1984) as they adhere to standards, mentor others, and sanction transgressions.
However, crowd moderation alone is often insufficient. Communities can come off the rails without oversight; as noted below, the large majority of subreddits removed by administrators were taken down due to a lack of active moderation (Statistica, 2022).
Local Context
A moderator’s primary goal is to interpret postings in relation to local context. Context comes into play not just for determining if a posting violates norms, but also for individual and community response to that violation. Context exacerbates or mitigates the impacts of ASB such as harm, embarrassment, fear, disgust, and so on. The response to ASB depends in part on a tiered layer of norms: societal norms and laws; social media norms and rules regarding content (as described in platform standards or policies); in-network norms, etiquette and rules (e.g., for subreddits within Reddit).
Community moderators differ from human platform reviewers in their role as interpreters of the “texts” on behalf of the readers and in reference to the local, community culture. Moderators stand in locus parentis—as parents—for readers of the site. Their emotional, contextual, and literary response to ASB situates them as proxies for the readers, but also proxies for the emotional reaction of readers. Reader response theory posits that the way individuals interpret and react to text cannot be disentangled with their own experience. Thus, the community moderator’s task is not just to have their own reaction to material, but to take on the emotion of the readers—a reader response on behalf of audience. The burden and responsibility are particularly high for community moderators because they are not anonymous, and not shielded by platform bureaucracy from those disagreeing with their decisions.
Communities also have ways of behaving that play into how moderators “read” the texts.
Is there an expectation of community engagement? Or of quality in posts? Ribeiro et al. (2022) find that approving posts before publication results in fewer, higher quality posts, and a more engaged readership (more comments, reactions (e.g., likes) and time on posts). Context also determines the expectation of emotional response, for example, are people there to get exposed to other’s points of view, and thus potentially offended? Is there an expectation that expressions of frustration, anger, shame will be tolerated? Is free speech the driving consideration? For example, Reddit’s AITA offers a site where emotional response is expected.
A catharsis for the frustrated moral philosopher in all of us, and a place to finally find out if you were wrong in an argument that’s been bothering you. Tell us about any non-violent conflict you have experienced; give us both sides of the story, and find out if you’re right, or you’re the asshole. (Reddit, 2022a)
Modeling Community Practice
Moderators are not just reactive. As accomplished members of the community, they model community practice, and educate new participants. Educating participants in the ways of the community becomes part of including participants in the norms of the collective as defined by its topic of discussion, medium of interaction, and norms of language, image, video use, and so on. Human moderators come into play as managers of the collective through informed, nuanced understanding of the purpose of the collective and of the limits to tolerance of transgressive behaviors. Since topics range, as do the kinds of discussion patterns expected, moderators are continuously doing community standard interpretation, readjusting expectations to topics as they appear.
Audience Typology
In-network norms may be taken as a baseline of what audience (network) members expect. Yet these are created and maintained by this very network. Thus, in a deliberately circular definition, the audience is those who use the network. This is structuration (Giddens, 1984): actors create the norms through performances of the norms, thereby setting and reinforcing the norms. This self-fulfilling definition does allow for change: norms may evolve through new interactions and conversations; actors may object to that evolution, grow with it, or depart to other networks.
Addressing ASB Perpetrators
A further, possible role of moderators is to go beyond identifying transgressors to addressing ASB. Much of the discussion of offline ASB is about identification of individuals—notably adolescents—who demonstrate anti-social personality disorder, and intervention treatments to reduce ASB. This does not appear to be a topic of discussion for online ASB. Most effort by social media platforms is in the prevention of offensive material reaching general users, with follow-up on individuals only so far as to identify their behavior to ban repeat offenders. It is an open question of how much moderators on social media should or can take on rehabilitation of individuals entrenched in the habit of ASB.
Challenges to Community and Crowd Moderation
Post-Traumatic Stress Disorder
As for human platform reviewers, community moderators come across disturbing postings that can leave them stressed and burned out (Gilbert, 2020). Prevailing norms on the social media platform, and the “free speech” ethos of social media, may encourage ASB, which may carry into internal communities. Reddit, for example, has been described by its chief executive officer Steve Huffman, “a place for open and honest conversations—‘open and honest’ meaning authentic, meaning messy, meaning the best and worst and realest and weirdest parts of humanity” (Marantz, 2018). Or, as the Reddit AskHistorians moderators have said, “I run the world’s largest historical outreach project and it’s on a cesspool of a website” Gilbert (2020).
Moderators, we might say, are the conservative “voice” of the community, that is, the one(s) to enforce local practices, prevent some content from going public, and manage non-conforming behavior. In response, moderators can come under personal attack that adds to distress and challenges their position in the community. However, if they shy away from enforcement, the community is left to try to manage itself. The number of subreddits banned due to lack of moderation suggests this is not a successful strategy: In 2021, the large majority of communities [256,995] removed by Reddit admins on the platform were taken down due to a lack of active moderation. While Reddit is widely known for the freedom it leaves to its users in creating topic-oriented communities, the platform is also reportedly very engaged in the moderation of its content and subreddits. The second most common reason for Reddit admins to remove entire communities was content manipulation [132,391], while ban evasion and harassment were reported in 6,476 removal cases. (Statistica, 2022) [Numbers are derived from the chart at the same site.]
Attitude and Bias
Community moderators can hold a variety of attitudes to their role, reflecting what they believe they should be doing and what the community is about. These attitudes can affect their approach to decision-making. Seering et al. (2022) found that moderators use a variety of metaphors to describe their role, coalescing around five types of relationships with their communities: nurturing and supporting (e.g., as a gardener or custodian); overseeing and facilitating (mediator, referee); fighting for (combatant, protector); managing (manager, team member); and/or governing and regulating (judge, governor, dictator) the community. Different responses to ASB can be expected from moderators according to these conceptions of their role. As members of the community, they may be particularly susceptible to ASB attacks that conflict with their perceived role. For example, moderators who view themselves as gardeners may feel particularly disturbed by nuisance attacks that target community values and make decisions based on community repair; or those who view themselves as combatants may be most disturbed by personal attacks on individuals, and act to intervene as their defender.
Community moderators’ emotional range and response to ASB can be expected to change as exposure makes them more triggered by (or even more complacent about) particular content and posters. The General Aggression Model (GAM) suggests that every exposure creates knowledge structures about the kind of aggression they are exposed to (Allen et al., 2018). While the model suggests this is a mechanism for increased aggression, moderators that recognize the aggressive patterns can use this to address ASB. The overseeing and facilitating type of moderator presents just such a practice.
[M]oderators in the Overseeing and Facilitating Communities category expressed philosophies for ideal social interaction, both between themselves and communities and inter-community interactions. For example, several interviewees who moderated spaces for political discussions on Facebook known as “discourse groups” wrote extensive rules and expectations for appropriate modes of interaction. Correspondingly, these moderators’ heuristics for intervention evaluated whether a piece of content or a conversation would be disruptive to the community, a philosophy more akin to “firefighting” than gardening. (Seering et al., 2022 p. 631)
Crowd Behavior
When ASB starts, where does it stop? Does ASB generate ASB? Research by Erreygers et al. (2018) finds that ASB does not generate more negative behavior, but prosocial behavior does initiate more of such positive behavior. “[R]results indicate long-term positive, reinforcing spirals of prosocial exchanges, but no long-term negative spirals of cyberbullying perpetration and victimization.” Their work supports two theories: bounded generalized reciprocity (Yamagishi et al., 1999) in online contexts, “which posits that when people observe others behaving prosocially, this creates an expectation of reciprocal prosocial behavior and motivates them to behave prosocially themselves”; and the reinforcing spirals model (Slater, 2007) that media use influences individuals’ cognitions and behavior, which shapes their subsequent media use.
While ASB may generate ASB (in keeping with the GAM model), in the online environment, other presentations of behaviors may compete with development of ASB. Slater (2007) found no evidence that being cyberbullied led to cyberbullying. Slater attributes this to the observation of prosocial behavior by others in the online social space. A number of theories support the idea that practice begets practice, including structuration theory, adaptive structuration, generalized and bounded generalized reciprocity (Yamagishi et al., 1999), and Slater’s reinforcing spirals. This suggests that by modeling appropriate, non-aggressive response to ASB, moderators, and participants can influence interaction in the community.
Does ASB generate a response? Research by Gruzd et al. (2020) of r/metacanada, a popular subreddit of the Canadian alt-right) found response rates differ by type of attack. Overall, 38% of ties were reciprocal (the authors report that this is similar to reciprocity in other online discussion groups). For anti-social postings, reciprocity levels were lower than overall rates, but differed by type of ASB attack. Reciprocity for “Attack on commentator” (personal; directly attacking another user) was 23.4%, whereas reciprocity for “Identity attack” (nuisance; attacking someone because of their identity) was much lower at 6.7%.
Since ASB’s harm is in its impact on others, where there is a response, the impact is visible, and potentially available for analysis. But some reactions are invisible, for example, when targets choose not to respond, remove themselves from forums, or cease use of a social medium. Awareness of communal views can lead to a spiral of silence with self-censorship of opinion; a network effect based on knowing about others in the networks: “the tendency of people not to speak up about policy issues in public—or among their family, friends, and work colleagues—when they believe their own point of view is not widely shared” (Hampton et al., 2014).
Conclusion
The concern about ASB is in the transgression of social norms, and the social and emotional effect on individuals, communities, and society. This article reviewed the nature of ASB online and examined ASB management in major social media to examine the extent of the issue on social media and to consider the pros and cons of their current moderation practices.
Moderation was described at the environment, community, and crowd level. The environment level aims to remove the most egregious and offending content, but faces challenges when dealing with content that transgresses social norms but is difficult to classify by automated means, and where transgression is only evident in the reactions of others to the content. Due to the billions of postings to evaluate, the emphasis in environment moderation is on the content of postings. However, ASB may also be seen in the connections between participants, as crowds respond by reporting offensive content, and communities respond by answering, debating, and sanctioning offenders.
A few potential future directions for research and practice in response to ASB online are suggested based on considering social networks. Examining patterns of response may reveal differences in intensity and impact of the ASB, for example, whether many, few, or none respond to the ASB, whether responses are primarily directed to the perpetrator, or whether ASB initiates multi-way interaction among members of the network. Repeated patterns of attack and defense may reveal common structures of interaction that give insight into community tolerance of potentially offending content, for example, whether the content is taken as subject for censure or for debate. Of note is that using network patterns as ASB identification tools bypasses the problem of tailoring for different kinds of media. Patterns of responses are “medium agnostic”: likes or dislikes, up or down votes, the addition of a comment or post of a reply, all look the same whether posted in response to words, audio, images, video.
Further future directions are also suggested based on a combined look at texts and networks. Discussions could be examined for patterns of responses that contain negative emotion in words or phrases that might point to anti-social intent by an initiating poster. A change in posting behavior, such as more words or more negative sentiment, might signify the start of ASB. Repeated patterns of such response across multiple cases might identify particular triggers in content that create negative emotional impact, and/or particular patterns of response that signify an ongoing ASB encounter. While the initial data would be collected after the content has become public, identification of trigger words or phrases may then provide input for pre-public moderation by algorithm or intervention by community moderator.
Other directions for enhancing content moderation are suggested by the importance of community moderation. The more the need for interpretation by humans to assess potential offense, the more falls to community moderators. This suggests designing system-based tools to aid these moderators. For example, moderators might compile sets of trigger words that can then be used as input to means of filtering or tagging posts. Similarly, the history of offenses in the environment might be made available to moderators, or tools provided that allow moderators to create their own list of offenses. Repositories of such community information could be shared among community moderators, and perhaps across communities for the benefit of the social media platform.
More ideas about local control are being implemented in new decentralized social media platforms, such as Bluesky and Mastadon, which support Twitter-like postings. The decentralized protocol allows users to create their own server, retain control of their data, and create their own social media community, which Mastadon refers to this as an instance. Each instance can set their own rules about blocking or silencing other instances, for example, those that “permit horrible things and behaviours.”
You can choose an Instance by language, moderation policy, political views or any other criteria . . . . A well-run Instance will have its policies clearly written on its homepage and also publish the list of blocked or silenced Instances. (Mastadon, n.d.)
Bluesky’s composable moderation (Garber, 2023) provides means for users to select and set viewing options for content categories: With a few straightforward settings, the app lets individuals decide whether they want to hide or show—or warn before showing—certain kinds of content like “explicit sexual images,” “political hate groups,” or “violent/bloody content.” The company also says it takes a “first pass” on moderating its central server in order to remove illegal content and label “objectionable material.” (Ghaffary, 2023)
These developments suggest new options that respond to challenges facing major social media, yet all social media share common plight in operating open, online interaction spaces. Methods will always be needed to remove egregiously offending and illegal material and repeat ASB offenders. Human moderators, whether overseeing Facebook groups, Reddit subreddits, or Mastadon instances, will always need to engage with how to manage, moderate, and sustain online communities. Content moderation will need to adapt to engage with human discourse and be vigilant for ASB. Good practice will continue to require both automated and human review, but has room for more tools and options for the human side of moderation.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part through the Social Sciences and Humanities Research Council of Canada.
Author Biography
), an international organization focused on exploring the role and impact of analytics in support of teaching, learning, and academic achievement. Current work examines the nature and impact of anti-social behavior online in association with the Social Media Lab at Toronto Metropolitan University.
