Sage Journals: Discover world-class research

Abstract

This study examines how visibility of a content moderator and ambiguity of moderated content influence perception of the moderation system in a social media environment. In the course of a two-day pre-registered experiment conducted in a realistic social media simulation, participants encountered moderated comments that were either unequivocally harsh or ambiguously worded, and the source of moderation was either unidentified, or attributed to other users or an automated system (AI). The results show that when comments were moderated by an AI versus other users, users perceived less accountability in the moderation system and had less trust in the moderation decision, especially for ambiguously worded harassments, as opposed to clear harassment cases. However, no differences emerged in the perceived moderation fairness, objectivity, and participants confidence in their understanding of the moderation process. Overall, our study demonstrates that users tend to question the moderation decision and system more when an AI moderator is visible, which highlights the complexity of effectively managing the visibility of automatic content moderation in the social media environment.

Keywords

Social media online harassment content moderation flagging AI moderator automated moderation system

Introduction

Moderation in social media platforms has become a prominent topic, especially with the rising wave of cyberbullying and online harassment. Online harassment ranges from offensive name-calling or purposeful embarrassment to stalking, physical threats or harassment over a sustained period (Vogels, 2021). According to surveys published by Pew Research, as many as 41% of U.S. adults have personally experienced online harassment (Vogels, 2021), and 66% have witnessed potentially harassing behavior directed toward others online (Duggan, 2017). To understand how content moderation works to prevent and remove inappropriate content, activists have pushed for more transparency in moderation decisions (Santa Clara Principles, 2018), especially since each social media platform (i.e. Facebook, Twitter, etc.) has its own set of community guidelines and regulations. Yet, most moderation systems and decisions remain opaque (Banchik, 2020; Crawford and Gillespie, 2016; Suzor et al., 2019), which can leave users confused and unsure about how and why content moderation occurs (Crawford and Gillespie, 2016; Jhaver et al., 2019a). While there is abundant critical scholarship focused on the issue of content moderation (e.g. Barocas et al., 2013; Gillespie, 2014, 2020; Kroll et al., 2016), few empirical studies have looked at how information visibility on moderation decisions affects users. One stream of research has started to focus on the reaction of the moderated users (Jhaver et al., 2019a, 2019b), while other literature has started to explore the effect of information visibility on online community members (e.g. Matias et al., 2020). For instance, when a welcome note to newcomers explained the community's norms and mentioned that harassment was uncommon within the community, newcomers’ participation in online feminism discussions increased (Matias et al., 2020). Yet, there is limited understanding of how lay users perceive moderation with different types of content and moderation source visibility.

Previous research has shown that users develop folk theories about why or how content has been taken down, and users often think that other humans are primarily responsible for such actions (Myers West, 2018). Yet, social media companies often use a mix of different moderators to regulate content on their website, from automated systems to commercial moderators to users themselves, but we do not know whether users react differently to different sources of moderation. In this study, therefore, our goal is to explore how the display of the source of moderation (AI vs. other users vs. unspecified source) affects users’ perception of, agreement with, trust in, and perceived fairness of the moderation decision. We also investigate how the source of moderation affects users’ belief certainty in how moderation works on the social media site as well as perceived objectivity and accountability of the moderation system.

This study is important for several reasons. First, it is critical to focus on online community members’ perspectives as they, as witnesses of problematic content, are indirectly impacted by harassment and, consequently, by the moderation systems in place. Information visibility, and specifically the source of moderation, can have important consequences on how users perceive social media platforms, learn from the moderation system in place, and whether they decide to become members and participate in an online community (Jhaver et al., 2019b; Matias et al., 2020). Second, existing approaches to algorithm analysis have paid limited attention to the relationships between platform policy and user perspectives (Shin and Park, 2019), so understanding users’ perception is needed to fully comprehend the ecosystem of moderation decisions.

Literature review

Sources of moderation

Social media platforms rely on both automated and human moderation systems to govern the content allowed on their sites. Platforms are working toward creating automated tools to identify hate speech, adult content, and extremism more quickly and at a larger scale than human reviewers, ideally removing this type of content before any human sees it. However, many researchers caution against a full reliance on automatic moderation. Researchers have pointed out that automated moderation tools are not very good at accounting for content subtlety, sarcasm, or cultural meaning (Duarte et al., 2018). They are also not able to easily adjust for context, for example, terrorist speech conveyed within a journalistic post (Llanso, 2019). Additionally, automated moderation systems can exacerbate rather than ameliorate the content policy problems that platforms face. Specifically, these systems, even when well-designed, increase opacity, make moderation practices more difficult to understand, and can reduce fairness within large-scale platforms (Gorwa et al., 2020). In the same vein, Gollatz and colleagues (2018) point out similar issues with ambiguity in the automation of content moderation, and that opaque implementation, vague definitions, and lack of accountability stand in the way of accuracy for automated content moderation.

The other type of moderation employed by platforms, “human” moderation, can take the form of commercial content moderation or community-based moderation. There has been an extensive discussion in the literature around the issues that arise from platforms’ use of commercial content moderators, people whose job it is to look through potentially harmful content before it is posted on social media platforms. Specifically, commercial moderation has been criticized for its harmful labor practices, including extremely stressful and mentally taxing working conditions and mental health challenges faced by moderators, as well as for its overall lack of transparency and accountability (Gillespie, 2018; Roberts, 2016).

On the other hand, sites like Wikipedia, Reddit, and Twitch rely on their own user communities to do most of the moderation. Seering and colleagues (2019) found that this type of moderation system leads to significantly different dynamics compared to when moderation is driven top-down by company policy. User-driven moderation is an intensely social process that is core to community development (see Gillespie, 2020; Llansó, 2020). Even sites that do use top-down moderation approaches like commercial moderators often do so in conjunction with user flagging, which is framed in terms of self-regulation of social media platforms. Many platforms have specific community guidelines and encourage users to flag content that violates these guidelines to distribute the labor of moderation. This also shifts responsibility (or at least perceived responsibility) onto the shoulders of the users rather than on the site.

Calls for increased transparency have grown in the literature, as it is believed that transparency increases understanding, fairness and trust (see Santa Clara Principles, 2018). However, many researchers have rightfully pointed out that blanket transparency may not truly be desirable, especially because transparency can be manipulated, by emphasizing specific information and downsizing other types of information (Berkelaar and Harrison, 2017; ter Hoeven et al., 2021). For example, too much transparency as to how decisions are made could lead to increases in individuals’ attempts to “game” the system. Some scholars have pointed to ways that moderating algorithms can be “gamed” and moderation can be used to harass and silence certain (often minority) groups online (Diakopoulos, 2015; Munger, 2017; Noble, 2018).

Given the different types of moderation source and the impact that revealing each of the sources can have on users, we explore how each of the moderation sources (AI, other users or unknown) affects users’ perceptions of 1) moderation decision via trust and fairness, their agreement with the moderation decision, and likelihood to look at the site content policy; 2) moderation process via belief certainty; and 3) moderation system via perceived system accountability and objectivity. In the next section, we explain the reasoning behind each of our research questions, by accounting for users’ perceptions of moderation decisions to their general understanding of moderation process to the platform's moderation system as a whole.

Fairness and Trust in the Moderation Decision

As the use of algorithmic systems has increased across platforms and technologies, many researchers have started to examine users’ perceptions of algorithms’ fairness, accountability, and transparency (FAT) (Shin and Park, 2019). Broadly speaking, perceived fairness of algorithms reflects a tenet that algorithmic decisions should not be discriminatory or prejudiced and should not lead to unjust consequences (Yang and Stoyanovich, 2017). For example, Facebook states that they strive to have fair moderation decisions, meaning that the same content posted by two users would be equally likely to be either pulled down or left up regardless of who posted the content (Gorwa, 2018). However, fairness can get quite complicated as the perception of a decision's fairness is largely contextual and subjective. Gorwa (2018) points out that aspects of fairness involve balancing individual rights to expression against potential widespread societal harm, and this balancing aspect may vary from users to users or from platforms to platforms.

In addition, another key facet of user perception of moderation is trust. Shin and Park (2019) found that when users had higher levels of trust in algorithmic systems, they were more likely to see algorithms as fair and accurate. Additionally, trust moderated the relationship between FAT and user satisfaction in their study. Other research found that transparency of an automatic moderation system is a prerequisite for trust in that system (Brunk et al., 2019). Yet, an important question is whether trust and perceived fairness of the moderation decision vary by type of moderation source (AI vs. user-based) and its visibility.

Agreement with the Moderation Decision and Likelihood to Look at the Site Content Policy

In addition to perceptions of the moderation decision, it is important to understand users’ actions around it. While social media companies give users the tools to help moderate inappropriate content, not knowing the source of moderation can confuse users and lead to inaccurate guesses about moderating decisions (Myers West, 2018; Santa Clara Principles, 2018; Suzor et al., 2019). Similarly, visibility of the source of moderation might influence users’ decision to check the site content policy when seeing a moderated comment. Looking at the site content policy can serve as an indicator of whether users question or want to gain more information on the moderation system and regulation.

Belief Certainty in the Moderation Process and Perceived Accountability and Objectivity of the Moderation System

For the perception of the moderation process, we examine belief certainty referring to users’ confidence around their understanding of how the moderation process works on the site. Certainty is an important dimension of attitudes and beliefs because it influences behavior's stability and durability over time (Bargh et al., 1992; Grant et al., 1994; Gross et al., 1995). People's confidence around the moderation process is likely to be linked to visibility or transparency around the identity of the moderations and explanations for why certain content has been taken down (Grimmelmann, 2015). When the moderation process is obscure, users lack the basis for evaluating moderation and interpret moderators’ decisions in a subjective (and often inaccurate) way (Kempton, 1986). For example, when a moderation system was opaque, users whose content was moderated by algorithms often erroneously attributed moderation to other users instead of algorithms (see Eslami et al., 2015; Myers West, 2018). In other words, in the absence of transparency around moderation process, users resort to their own folk theories (Eslami et al., 2016), which is a combination of a priori beliefs about how the system works and new information gained from direct experience with the platform (DeVito et al., 2018; French, 2017). Similar to how an explanation around content removal facilitates users’ understanding of how moderation works (Jhaver et al., 2019a), it can increase their belief certainty, and we specifically examine how disclosing the source of moderation and types of moderators (users vs. AI vs. unknown) influence belief certainty in the moderation process.

On the platform level, we examine users’ perceptions of the moderation system's accountability and objectivity. Algorithmic accountability refers to perceived impacts or consequences of an algorithmic system, and the extent to which the algorithmic system (or its creators) is held accountable for unintended harms that arise due to the system's decisions. Users may be concerned that algorithmic systems are vulnerable to making mistakes or can lead to undesired consequences (Lee, 2018). Accountability is also often linked to visibility of information. Being able to make observations from visible information helps create insights and knowledge required to hold systems accountable. It has been shown that visibility of information can create two forms of perceived accountability: a “soft” accountability, where systems and platforms must answer for their action, and a “hard” accountability, where insights and knowledge that come as a result of visibility bring about “power to sanction and demand compensation for harms” (Ananny and Crawford, 2018: 976; see Fox, 2007). Following Rader and colleagues (2018), we refer to perceived accountability as participants’ beliefs of “how the system might be accountable to them, as individual users” (p. 8) through measuring their perceived control over the outputs and perceived system's fairness, and we raise a question of how visibility around the source of moderation might influence perceptions of system accountability.

Finally, we also explore perceived objectivity of the moderation system. Perceived objectivity has been linked to higher perceived trust and credibility in algorithmic decisions (Sundar and Nass, 2001). People often assume that algorithmic decisions are more credible than human decisions because computers are more objective than humans (Sundar and Nass, 2001). However, individuals can vary in their experience with algorithms, and when a machine is too “machine-like,” system objectivity can backfire and users can infer more trust in human than machine decisions (Dietvorst et al., 2015; Waddell, 2018). Yet, perceived objectivity in moderation decisions hasn't been yet explored in relation to transparency around the source of moderation.

Ambiguity in moderation

As discussed previously, there is a great deal of opaqueness surrounding content moderation. Content ambiguity can add another layer of opacity, especially because different social media platforms have different strategies for dealing with problematic content and even have different definitions of allowable or appropriate content. Pater and colleagues (2016) conducted a content analysis of the harassment regulation policies for fifteen different social media platforms. They find what they call a “striking inconsistency” between different platforms in their definitions of harassment. This lack of a common definition of harassment across platforms can cause uncertainty on how to respond to different types of offenses. As a result, there is a discrepancy between the ways in which different platforms respond to harassment. For example, some platforms merely censor the offensive content, while other policies point to the potential involvement of government and law enforcement such as police and intelligence agencies (Brown and Pearson, 2018).

The context and intentionality, previous interactions between the harasser and the person being harassed, offline power dynamics and many other factors can all be important in determining if a post or a comment constitutes harassment (Langos, 2012). In fact, flagging can sometimes move from a mechanism of reporting and upstanding into something that can be “gamed” and abused (Crawford and Gillespie, 2016). Finally, because of ambiguity in harassment comments, building automated systems to detect harassment can be difficult and these systems cannot necessarily understand and interpret ambiguity (Nadali et al., 2013).

Therefore, we explore how content ambiguity can moderate effects of source visibility on users’ perceptions around moderation decision (trust and fairness), moderation process (belief certainty), and moderation system (accountability and objectivity), as well as people's agreement with the moderation decision and their likelihood to look at the site policy. A summary of our predictions is presented in Figure 1.

Figure 1.

Summary of predictions.

Methods

This study was part of a larger experiment, which resulted in two separate studies, each one involving different research questions, hypotheses, and dependent variables (first study: Bhandari et al., 2021). The experiment was pre-registered on the Open Science Framework, with the research questions, measures, and analysis plan available at the following https://osf.io/bwm8e. It involved a custom-made social media platform called “EatSnap.Love,” which focuses on sharing pictures of food and reproduces basic functionalities of a social network site (DiFranzo et al., 2018; Taylor et al., 2019). This platform was designed as a testbed to study users’ reactions to harassment online in a realistic yet controlled environment. As a cover story, we told participants that they would be beta testing a new social media platform. On this platform, users can create and share their own social media posts through a newsfeed, and to comment, like, and flag posts. Within EatSnap.Love, we have used a social media simulation engine called Truman, which mimics other users’ behaviors on the site through pre-programmed bots that could comment or like participants’ posts and others’ posts that were displayed on the participant's newsfeed as if they were coming from real users on the site. This way the Truman platform allows for a realistic social media experience while preserving the experimental control through identically curated socio-technical environments for each participant assigned to a similar experimental treatment. The Truman platform is open-source and allows for easy replication of this or any other study, as all simulation material (bots, posts, images, etc.) as well as the platform itself are available on this GitHub https://github.com/cornellsml/truman.

Participants

We recruited 582 participants from Amazon MTurk to participate in our study in exchange for a $10 compensation. The power analysis calculation using the software program G*Power yielded a target sample size of 350, with .80 power to detect a medium effect size of .20 at the standard .05 error probability. To receive the full payment, participants were asked to log into EatSnap.love at least twice a day for two days and to create at least one post each day. When signing up on the platform, participants could see the terms of service as well as the community rules (see supplementary material for details). After removing participants who did not complete at least one day of the two-day study, our final sample size was 397 (54% female). Participants were located in the US, and their ages ranged from 18 to 70 (M = 35.50, SD = 9.70). More than 70% of them had some college degree or a 4-year college degree, and 74% were white (296), 10.5% were Asian, 7% were black or African American, 6% were Hispanic or Latino, 0.5% were American Indian or Native American, 1% were Asian and 2% indicated being another race or ethnicity outside of these options.

Experimental design

Participants were randomly assigned to one of the six conditions, and in all the conditions, they encountered in their newsfeed one moderated comment per day. We employed a 3 (other users vs. an automated AI system vs. no source identified) × 2 (ambiguous vs. clear harassment comment) between-subject factorial design, as described in the pre-registration. The visibility of the source of moderation of the harassment comment was manipulated by mentioning that either an automated system or other users on the site moderated the comment, or not mentioning the source of moderation at all (See Figure 2 for screenshots of the conditions). Depending on the experimental treatment, ambiguity of the moderated comment was also manipulated, making the moderated comment either clearly harassing or ambiguously worded (See Table 1).

Figure 2.

Screenshot of the moderated social media comment. The screenshot shows the full post for Day 1 for the unambiguous comment and for the AI as moderation source condition. For the other conditions, instead of “our automated system has flagged this comment as harassment,” it was written for the unspecified source condition: “This comment has been flagged as harassment” and for the other users condition: “Other users have flagged this comment as harassment.”

Table 1.

Comments selected for the experiment across conditions and days.

	Condition	Day 1	Day 2
Flagged	Unambiguous	CMT1.1: Is she pregnant? Or did she just gain 500 pounds? LOL #chubs #cheesecake	CMT2.1: This photo is uglier than you and that's saying something
Flagged	Ambiguous	CMT 2.1: ummmm imma pass on ever eating this	CMT 2.2:plzzz don't ever cook this for me

To determine the appropriate level of ambiguity of the comments, we pilot tested 40 comments that were evaluated by 145 participants recruited on MTurk. The comments were initially created by our research team based on real comments they saw on social media platforms. The pilot test ascertained the difference between ambiguous and clear harassment content by evaluating perception of harassment for each comment (3 questions, e.g. to what extent do you see this as online harassment? 1 = not at all to 9 = very much so). Two ambiguous and two clear harassment comments were selected for both Day 1 and Day 2 and the results of the pilot test showed that the comments (ambiguous vs. clear harassment) were statistically significant for Day 1, t(144) = 16.52, p < .01, and for Day 2, t(144) = 19.61, p < .01.

Procedure

Participants first completed a pre-survey that collected demographic information as well as information on their social media use, web skills, and tolerance for ambiguity and affinity for technology. When registering on the study's social media site, participants could read the community rules (e.g. no bullying, no non-food posts, etc.). Each day of their participation in the study, all participants were exposed to a harassment comment between two different EatSnap.love users (bots) that was moderated by one of the moderation sources, randomly assigned to each participant out of the three choices described above. The moderated comment was always displayed near the top of the newsfeed, randomly placed between the first and sixth post Participants visited EatSnap.love on average 12.5 times on the first day and 8.5 times the second day. Participants also engaged with the site by liking on average 23 posts, commenting on 3.68 posts, and flagging 1.62 comments. At the end of the two days, participants completed the post-survey, and their account was deactivated.

Measures

There were two types of measures used in the study: two behavioral measures captured through log data during participants’ interaction on the platform, and post-survey measures. The behavioral measures for this study were: 1) agreement with the moderation decision (“yes”/”no” prompt if they agreed with the moderator's decision) (see Figure 2) and 2) the choice to view the site's moderation policy (after participants indicated their answer to the agreement with the moderation decision question, another question appeared asking if participants wished to review the site's community rules; their positive response would take them to the community rules page. The post-survey included measures on 1) the moderation decision (trusting in the decision and its perceived fairness), 2) the moderation process capturing participants’ belief certainty regarding how they understood the moderation process on the platform, and 3) perceived accountability and objectivity of the platform moderation system.

For judgments on the moderation decisions, we evaluated trust in the moderation decision (“How much do you trust that EatSnap.Love make(s) good moderation decisions?”, measured on 1 = no trust at all to 7 = extreme trust), and perceived fairness of the moderation decision (“How fair is it for [scenario subject] that their comment was flagged?”, measured on 1 = very unfair to 7 = very fair; Lee, 2018). Belief certainty about how the moderation process works was captured with three items (alpha = .90; e.g. “How certain are you that you know how EatSnap.Love's moderation process works?” 1 = not at all certain to 5 = extremely certain) that we adapted from French (2017). Finally, we captured judgments about the platform moderation system through perception of system accountability with three items (alpha = .87), (e.g. “EatSnap.Love's flagging system acts in the best interest of the people who use it,” “EatSnap.Love's flagging system behaves the same way for everyone who uses it.” 1 = strongly disagree to 7 = strongly agree), adapted from Rader and colleagues (2018). For perception of system objectivity, we adapted the scale from Sundar and Nass (2001), consisting of four adjectives (objective, fair, fact-based, and biased (reverse coded) and asking participants to rate to what extent they believed that each of them described the EatSnap.Love flagging system (1 = describes very poorly to 7 = describes very well; alpha = .87).

Results

All the analyses include the following control variables: web skills, affinity with technology, tolerance for ambiguity, age, gender, and education; they are reported only when found to be significant predictors on the dependent variables. We present below our most pertinent results organized by participants’ perceptions of (1) the moderation decision, (2) the moderation process, (3) the moderation system, and an additional category of (4) behavioral variables (agreement with the moderation decision and choice to review the site's moderation policy). Moderation decision is based on the perception of the moderation related to harassment comment; moderation process emphasizes the understanding of how the process works, and perception of the moderation system focuses on the judgment of the overall moderation system. A complete set of the results can be found in Tables 2 and 3. For thoroughness and consistency, we ran the main effects and interactions on all our dependent variables, which means that some results presented below are not included in our pre-registration. For continuous dependent variables, we used the AOV modeling function from stats package on R software to perform ANCOVAs to highlight differences in means between the moderation sources. For categorical dependent variables, we used the GLMER modeling function from LME4 package (Bates et al., 2015: 4) to perform binary logistic regressions.

Table 2.

ANCOVA results.

Source	Dependent variables	D	F	Cohen's d
Source of moderation (A)	Perceived trust	2	3.25**	0.13
	Perceived fairness	2	0.33	0.04
	Belief certainty in the system	2	1.48	0.09
	Perception of system accountability	2	2.73*	0.12
	Perception of system objectivity	2	0.26	0.04
Ambiguity of harassment comment (B)	Perceived trust	1	31.92****	0.29
	Perceived fairness	1	241.92****	0.79
	Belief certainty in the system	1	12.47****	0.18
	Perception of system accountability	1	20.60****	0.23
	Perception of system objectivity	1	27.62****	0.27
Interaction (A) * (B)	Perceived trust	2	3.48**	0.13
	Perceived fairness	2	1.31	0.27
	Belief certainty in the system	2	1.47	0.09
	Perception of system accountability	2	3.38**	0.13
	Perception of system objectivity	2	0.56	0.05

Note. Control variables included in the analyses: web skills, affinity with technology, tolerance for ambiguity, age, gender, and education.

**** = p < .000, *** = p < .01, ** = p < .05, * = p < .1.

Table 3.

Binary logistic regression results.

Dependent variables	Predictors	b ^a	Z value	S.E.
Agreement with moderation decision	Source of moderation (AI vs. unknown) (A)	0.05	0.11	0.44
(Responding vs. ignoring the question)	Source of moderation (AI vs. Users) (B)	0.17	0.40	0.43
	Ambiguity of harassment comment (C)	−1.21***	−3.14	0.38
	Interaction (A) * (C)	0.10	0.11	0.88
	Interaction (B) * (C)	0.65	0.75	0.87
Agreement with moderation decision	Source of moderation (AI vs. unknown) (A)	−0.17	−0.45	0.38
(Answering no vs. yes)	Source of moderation (AI vs. Users) (B)	0.23	0.59	0.39
	Ambiguity of harassment comment (C)	3.88****	6.77	0.57
	Interaction (A) * (C)	−0.71	−0.92	0.78
	Interaction (B) * (C)	−0.52	−0.65	0.80
Likelihood to look at site policy	Source of moderation (AI vs. unknown) (A)	0.23	0.94	0.24
(Responding vs. ignoring the question)	Source of moderation (AI vs. Users) (B)	0.46*	1.87	0.24
	Ambiguity of harassment comment (C)	−0.17	−0.87	0.20
	Interaction (A) * (C)	0.29	0.61	0.48
	Interaction (B) * (C)	−0.20	−0.41	0.49
Likelihood to look at site policy	Source of moderation (AI vs. unknown) (A)	0.43	1.39	0.31
(Answering no vs. yes)	Source of moderation (AI vs. Users) (B)	0.22	0.69	0.33
	Ambiguity of harassment comment (C)	−0.34	0.20	0.26
	Interaction (A) * (C)	−0.06	−0.10	0.62
	Interaction (B) * (C)	−0.07	−0.11	0.65

Note. Control variables included in the analyses: web skills, affinity with technology, tolerance for ambiguity, age, gender, and education.

Unstandardized coefficient estimate.

**** = p < .000, *** = p < .01, ** = p < .05, * = p < .1.

Perception of moderation decision: fairness and trust

Perception of the moderation decision was measured via fairness and trust While the moderation source did not change perception of fairness (see Table 2), it did impact perception of trust in the moderation decision F(2389) = 3.25, p = .04, Cohen's d = .13. A pairwise comparison assessing mean difference showed that participants in the automated moderation condition reported lower trust in the moderation decision (M = 4.89, SE = .12) than those in the other users as source of moderation condition (M = 5.27, SE = .10, p = .04). No differences were detected when comparing these two moderators with the unidentified source condition (M = 5.11, SE = .11). From all the control variables, only affinity with technology had an effect on trust, with higher affinity for technology indicating higher trust in the moderation decision F(1389) = 6.21 p = .01, Cohen's d = .16.

Furthermore, there was a significant interaction between the source of moderation and comment ambiguity on perceived trust, F(2387) = 3.48, p = .03, Cohen's d = .13. When the comment was ambiguously harassing another user, participants in the automated condition perceived less trust in the moderation decision (M = 4.38, SE = .18) than those in the other users condition (M = 4.95, SE = .15) b = −.59, SE = .20, p = .01, or when moderation source was unknown (M = 4.94, SE = .16) b = −.55, SE = .20, p = .02. Therefore, participants perceived less trust in the moderation when the decision was made by an AI than by other users or unidentified moderation source, but only when the comment was ambiguously worded. Indeed, no significant differences emerged in the clear harassment comment condition between the AI and the unknown moderation source conditions, b = .22, SE = .22, p = .56, and between AI and the other users conditions, b = −.29, SE = .22, p = .39.

Ambiguity of the harassment comment did impact both perception of fairness F(1389) = 241.92, p < .001, Cohen's d = .79 and trust F(1389) = 31.92, p < .001, Cohen's d = .29. Specifically, participants in the ambiguous condition perceived lower trust in the moderation decision (M = 4.74, SE = .10) than those in the clear harassment comment condition (M = 5.48, SE = .08) as well as lower perceived fairness (ambiguously worded comment: M = 3.93, SE = .13, clearly harassing comment: M = 6.37, SE = .08).

Perception of moderation process: belief certainty

For perception of the moderation process, we captured belief certainty about how the moderation process works. The results demonstrate that the moderation source did not impact how certain participants felt about how the moderation process worked, and there was no interaction between moderation source and comment type (see Table 2). However, comment type significantly impacted belief certainty F(1389) = 12.47, p < .000, Cohen's d = .18, such that, unsurprisingly, participants in the ambiguously worded condition perceived lower belief certainty in the system (M = 2.50, SE = .07) than those in the clear harassment comment condition (M = 2.86, SE = .08).

Perception of moderation system: accountability and objectivity

Perceptions about the moderation system were captured with participants’ assessments of system objectivity and system accountability. While moderation source did not impact perception of system objectivity, it did impact system accountability, with higher ratings capturing participants’ belief that the moderation system acted in the best interest of its users. Our results showed a marginally significant effect of the source of moderation F(2389) = 2.73, p = .07, Cohen's d = .12 on system accountability such that the system was perceived to be acting less in the best interest of its users in the automated condition (M = 4.97, SE = .13) than in the other users’ condition (M = 5.34, SE = .11, p = .05). No differences were significant when comparing the AI or other users’ condition with the unidentified source condition (M = 5.16, SE = .11), with p with AI condition = .50 and p with other users condition = .46.

When adding the interaction term to the models, moderation source and comments type were qualified by a significant interaction F(2387) = 3.38, p = .035, Cohen's d = .13 on system accountability. When the comment was ambiguously harassing, participants in the automatic content moderation condition had a tendency to perceive the system as acting less in the best interest of its users (M = 4.52, SE = .18) than those in the other users as moderators condition (M = 5.01, SE = .16), b = .50, SE = .22, p = .05 or when moderation source was unknown (M = 5.10, SE = .16), b = −.55, SE = .22, p = .03. However, no difference was found between the unknown condition and the other user condition, b = .04, SE = .22, p = .98. No such differences emerged with clear harassment messages between AI and the unknown moderation source condition, b = .28, SE = .24, p = .456, and between AI and the other users conditions, b = −.14, SE = .23, p = .81.

Comment type had a significant effect on both perception of system accountability F(1389) = 20.60, p < .000, Cohen's d = .23 and on system objectivity F(1389) = 27.62, p < .000, Cohen's d = .27. When the comment was ambiguous, participants reported lower perception of the system acting in users’ best interests (M = 4.86, SE = .10) and lower feelings of system objectivity (M = 4.36, SE = .07) compared to when exposed to clear harassment comment (accountability: M = 5.49, SE = .09, objectivity: M = 4.85, SE = .05). Tolerance for ambiguity was also a significant covariate when looking at accountability, with higher tolerance indicating higher feelings of accountability, F(1389) = 4.04, p = .04, Cohen's d = .10.

Likelihood to check content policy and agreement with the moderation decision

For our final analyses, we examined the effect of source visibility and comment ambiguity on behavioral actions. We used logistic regressions to test whether the source of moderation, the comment type and their interaction impacted participants’ decision to agree with the moderation decision and their likelihood to check the content policy. We analyzed agreement with the moderation decision in two ways, first by comparing those who responded to the prompt “do you agree with the moderation decision?” and those who ignored it completely, and second, within the group who responded, we compared participants who agreed versus those who disagreed with the moderation decision. The only significant impact in both cases was produced by the content ambiguity, with participants in the clearly harassing cases being more likely to respond than to ignore the prompt, b = −1.21, SE = .64, p < .01, and from everyone who responded, participants in the clearly harassing cases were more likely to agree with the moderation decision, b = 3.88, SE = .57, p < .001. Specifically, 94.5% of those in the clear harassment condition answered the prompt, compared to 84.1% participants in the ambiguous condition; and from everyone who responded to the prompt, 94.7% of the respondents agreed with the moderation decision “yes” in the clear harassment condition versus only 27.1% in the ambiguous comment condition. Moderation source was not a significant predictor, and there were no moderation effects (See Table 3 for a complete set of results). Regarding the likelihood to check the content policy, results showed non-significant differences between those who responded and those who did not, as well as between participants who responded “yes” or “no” to the prompt for each of our manipulated conditions (see Table 3).

Discussion

The goal of this study was to understand how making a content moderator visible in a social media environment may influence perceptions of the moderation system. We found that users tend to question (1) their trust in the moderation decision and (2) the accountability of the moderation system when moderation was attributed to AI compared to other users or an unknown source. However, instead of AI moderation being seen as less trustworthy and accountable across the board, the difference between AI and the other sources of moderation only emerged for content moderation of ambiguously worded harassment, and not for clear harassment cases. Below we explain the meaning of these results from the point of view of users’ engagement with problematic content moderated by different sources, as well as the implications for making moderation sources visible on social media platforms.

With regards to the perceptions of moderation decision, we found that users were less trusting when exposed to an automated system moderator (vs. other users or unknown source) when the harassment comment was ambiguously worded. Interestingly, no differences in source of moderation nor in the interaction of source of moderation and comment type were found for fairness of the moderation decision. This is somewhat surprising as fairness is a critical component of trust in policy making (see report OECD, 2017), with higher perceived fairness leading to higher trust in the decision. One possible explanation may have to do with an a priori knowledge of the outcome. When people make judgments about a decision (or about the decision maker) with the knowledge of a certain outcome, they tend to place less weight on fairness (Tyler, 1996). Knowing that the comment was already flagged may have dampened users’ judgment of fairness. However, our participants tended to have less trust in the moderation decision when the moderator was an AI (vs. other users or unidentified source), when paired up with ambiguous content, which may be due to inherent biases that humans have toward machines. As pointed out by Lee’s (2018) research on algorithm fairness and trust, while people can trust computers to do their job, they don't give them full trust because of the possibility of system glitches. When combined with ambiguously worded content, people may become less trusting toward machine moderation (compared to other users or unidentified sources) due to this inherent bias and possibility of error on the part of AI moderators. In other words, the window of trust appears to be smaller for AI compared to other users and unidentified moderation sources when content is obscure and does not lend itself to an unequivocal harassment interpretation.

With regard to the perception of the moderation process, we found that when the harassment comment was clearly harassing, participants expressed that they were more certain in their understanding of how moderation worked (i.e. more belief certainty) compared to when the comment was ambiguous. Interestingly, there was no difference in belief certainty depending on the moderation source, indicating that automatic moderation did not cause more doubts in their understanding of the moderation system. The literature suggests that when AI is understood and analyzed by lay users, AI systems can be better integrated in day-to-day activities (Hagras, 2018).

Perception of the moderation system was examined via participants’ assessment of the system's accountability and objectivity. Participants did not judge the system objectivity differently based on the moderation source, but there were significant differences in perceptions of system accountability depending on the moderator (i.e. an automated moderation system vs. other users or unknown source) when the comment was ambiguously harassing another user. While system objectivity refers to general fairness of the moderation system, accountability goes beyond fairness and captures whether the system is perceived to be acting in the best interest of all users (Rader et al., 2018). It is possible that in the more ambiguous situation, users are concerned about AI's moderation because of the above-mentioned inherent bias toward machines (Lee, 2018) foregrounded by ambiguous content. In other words, people may be willing to yield to machine judgments in clear-cut and unequivocal cases, but any kind of contextual ambiguity resurfaces underlying questions about the AI moderation system because of a possibility of errors and questions about whose interests and values it reflects. In contrast, when participants see “other users” or even an unidentified source flagging an ambiguous comment, they may be less likely to question their actions and intentions because the trust window is larger with other users than with AI moderation.

Finally, we looked at whether visibility of moderators impacted users’ agreement with the moderation decision and likelihood to look at the site content policy. Moderation source did not impact users’ agreement (vs. disagreement) with the moderation decision but the exposure to different type of comments led participants to take different actions on the site: in the clear (vs. ambiguous) harassment comment condition, participants were more likely to answer the agreement question than to skip it, presumably because they were more confident in judging the comment as harassment, and particularly to answer “yes” (vs. “no”). This kind of prompt may have important implications: the more harassment comments are reported by users, the higher the likelihood that the platform will remove the stated comment and the more the engagement of users on the platform (Jiménez Durán, 2022).

Overall, these results highlight how ambiguity in harassing comments strongly impacts perceptions of moderation decisions and moderation systems, especially when the moderator is an AI system. The results show that when users feel uncertain about whether the comment is harassing, they are more likely to question the moderation decision and the moderation system, similar to how they question the certainty of their own understanding of the system (Blackwell et al., 2018). Furthermore, the exposure to the same ambiguous content led to more questioning of AI moderation than the other moderation sources (i.e. other users and unknown sources). This finding adds a more nuanced understanding of folk theories about the role of a priori beliefs and direct experience with content moderation (DeVito et al., 2018; Eslami et al., 2016; French, 2017). It suggests that users assign different weights to direct experiences with different types of moderators (AI vs. other users or unknown source), likely because of the differences in a priori beliefs about them, and future research should examine an interplay of a prior beliefs and direct experiences in how humans perceive and interact with AI content moderation systems.

Limitations

As in any research, this study comes with some limitations. Despite our ability to reproduce a simulated social media environment to increase the ecological validity of the experiment, the social media site we used (EatSnap.Love) was new to our participants. Being in a new social media environment may have made our participants act differently than in their commonly used social media sites. Yet, even in this new environment and not knowing the other users of the site, participants still trusted more the moderation decision with ambiguous content when other users flagged the comment than when an automated system did so. Future research could extend the test of moderation source visibility to real world platforms, for example, by using field experiments, similar to the study done by Matias (2019). Second, the results do not elucidate participants’ interpretations of their experiences with the moderation system, which could be probed with open-ended questions and qualitative data. Finally, while we examined three different moderators, subsequent research should also look at the effect of commercial moderators on perceptions of moderation decisions and systems. Adding this fourth category would give a broader understanding of how people engage with different types of content moderation.

Conclusion

The findings from our study reveal the complexity of how users interact with different types of content moderation in social media platforms. Specifically, users are more likely to question AI moderators, especially how much they can trust their moderation decision and the moderation system's accountability, but only when moderation content is inherently ambiguous. In other words, rather than invariably trusting AI's moderation less and consistently seeing AI as less accountable compared to other users’ moderation, the difference only emerges with ambiguous content. Those findings are increasingly relevant to the question of whether AI should be used solely to moderate content (Gillespie, 2020), as they highlight the differences in how people engage with AI content moderation system compared to human moderators.

While media companies continue to use artificial intelligence to improve users’ experience (Moss, 2021) and to maintain engagement and profitability (Jiménez Durán, 2022), a promising approach could be identifying ways for AI and humans to work together as “moderating teammates,” in line with the “machines as teammates” paradigm (see Seeber et al., 2020). Even if AI could effectively moderate content, there is a necessity of human moderators as rules in community are constantly changing, and cultural context differ, so there is never a shared value system that everyone agrees on (Gillespie, 2020). Furthermore, AI moderators can help with content moderation scalability and reduce the burden on professional moderators who deal daily with harassment comments, violence, or conspiracies (Irwin, 2022). As our findings suggest, when a comment is clearly harassing another user, using AI as moderator does not seem to affect users’ perception of the moderation system. However, for an ambiguously harassing comment, platforms could rely on users rather than AI, or look for ways to engage automated moderation in a way that supports “partnership between human and automated moderation” (Gillespie, 2020: 4), for example, by AI tools providing users with contextual information when users are not sure about ambiguously worded harassment. Similar to how users expand their understanding of the breadth and impact of a harassment problem through the act of labeling negative online experiences as “online harassment” (Blackwell et al., 2018), users may increase their understanding of AI moderation systems through direct experience.

Footnotes

Acknowledgments

We want to thank the reviewers for their comments. We also thank Anna Spring for her help with developing the simulation, Katherine Miller with providing support with participant compensation, and research assistants Annika Pinch, Suzanne Lee and Hyun Seo (Lucy) Lee for their help with testing the simulation and managing data collection.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the USDA NIFA HATCH (grant number 2020-21-276).

ORCID iDs

Marie Ozanne

Natalya N Bazarova

References

Ananny

Crawford

(2018) Seeing without knowing: limitations of the transparency ideal and its application to algorithmic accountability. New Media & Society 20(3): 973–989. DOI: 10.1177/1461444816676645.

Banchik

(2020) Disappearing acts: content moderation and emergent practices to preserve at-risk human rights–related content. New Media & Society 23(6): 1527–1544.

Bargh

Chaiken

Govender

, et al. (1992) The generality of the automatic attitude activation effect. Journal of Personality and Social Psychology 62(6): 893.

Barocas

Hood

Ziewitz

(2013) Governing algorithms: A provocation piece. Available at SSRN 2245322.

Bates

Maechler

Bolker

, et al. (2015) Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67: 1–48.

Berkelaar

Harrison

(2017) Information visibility. In: Oxford Research Encyclopedia of Communication.

Bhandari

Ozanne

Bazarova

, et al. (2021) Do you care who flagged this post? Effects of moderator visibility on bystander behavior. Journal of Computer-Mediated Communication 26(5): 284–300.

Blackwell

Chen

Schoenebeck

, et al. (2018) When online harassment is perceived as justified. In: Twelfth International AAAI Conference on Web and Social Media.

Brown

Pearson

(2018) Social media, the online environment and terrorism. In: Routledge Handbook of Terrorism and Counterterrorism. Oxon, UK: Routledge, 149–164.

10.

Brunk

Mattern

Riehle

(2019) Effect of transparency and trust on acceptance of automatic online comment moderation systems. In: 2019 IEEE 21st Conference on Business Informatics (CBI), 2019, pp. 429–435. IEEE.

11.

Crawford

Gillespie

(2016) What is a flag for? Social media reporting tools and the vocabulary of complaint. New Media & Society 18(3): 410–428. DOI: 10.1177/1461444814543163.

12.

DeVito

Birnholtz

Hancock

, et al. (2018) How People Form Folk Theories of Social Media Feeds and What it Means for How We Study Self-Presentation. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, Montreal QC Canada, 19 April 2018, pp. 1–12. ACM. DOI: 10.1145/3173574.3173694.

13.

Diakopoulos

(2015) Algorithmic accountability: journalistic investigation of computational power structures. Digital Journalism 3(3): 398–415.

14.

Dietvorst

Simmons

Massey

(2015) Algorithm aversion: people erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General 144(1): 114.

15.

DiFranzo

Taylor

Kazerooni

, et al. (2018) Upstanding by Design: Bystander Intervention in Cyberbullying. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems - CHI ‘18, Montreal QC, Canada, 2018, pp. 1–12. ACM Press. DOI: 10.1145/3173574.3173785.

16.

Duarte

Llanso

Loup

(2018) Mixed messages? The limits of automated social Media content analysis. In: FAT, 2018, p. 106.

17.

Duggan

(2017) Witnessing online harassment. In: Pew Research Center: Internet, Science & Tech. Available at: https://www.pewresearch.org/internet/2017/07/11/witnessing-online-harassment/ (accessed 15 April 2022).

18.

Eslami

Karahalios

Sandvig

, et al. (2016) First I ‘like’ it, then I hide it: Folk Theories of Social Feeds. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, San Jose California USA, 7 May 2016, pp. 2371–2382. ACM. DOI: 10.1145/2858036.2858494.

19.

Eslami

Rickman

Vaccaro

, et al. (2015) ‘I always assumed that I wasn’t really that close to [her]’ Reasoning about Invisible Algorithms in News Feeds. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems, 2015, pp. 153–162.

20.

Fox

(2007) The uncertain relationship between transparency and accountability. Development in Practice 17(4–5): 663–671.

21.

French

(2017) Algorithmic Mirrors: an Examination of How Personalized Recommendations Can Shape Self-perceptions and Reinforce Gender Stereotypes. PhD, Stanford University, United States, California. Available at: https://www.proquestcom/docview/2436884514/abstract/D2717BFCBCD441FPQ/1 (accessed 9 August 2021).

22.

Gillespie

(2014) The relevance of algorithms. Media Technologies: Essays on Communication, Materiality, and Society 167(2014): 167.

23.

Gillespie

(2018) Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media. New Haven, CT: Yale University Press.

24.

Gillespie

(2020) Content moderation, AI, and the question of scale. Big Data & Society 7(2): 2053951720943234.

25.

Gollatz

Beer

Katzenbach

(2018) The Turn to Artificial Intelligence in Governing Communication Online. Berlin. https://www.ssoar.info/ssoar/handle/document/59528.

26.

Gorwa

(2018) Towards fairness, accountability, and transparency in platform governance. AoIR Selected Papers of Internet Research.

27.

Gorwa

Binns

Katzenbach

(2020) Algorithmic content moderation: technical and political challenges in the automation of platform governance. Big Data & Society 7(1): 2053951719897945.

28.

Grant

Button

Noseworthy

(1994) Predicting attitude stability. Canadian Journal of Behavioural Science/Revue Canadienne des Sciences du Comportement 26(1): 68.

29.

Grimmelmann

(2015) The virtues of moderation. Yale JL & Tech. 17: 42.

30.

Gross

Holtz

Miller

(1995) Attitude certainty. Attitude Strength: Antecedents and Consequences 4: 215–245.

31.

Hagras

(2018) Toward human-understandable, explainable AI. Computer 51(9): 28–36. DOI: 10.1109/MC.2018.3620965.

32.

Irwin

(2022) Two content moderators file class-action lawsuit against TikTok. Available at: https://www.protocol.com/bulletins/tiktok-content-moderation-lawsuit (accessed 14 April 2022).

33.

Jhaver

Appling

Gilbert

, et al. (2019a) ‘Did you suspect the post would be removed?’: understanding user reactions to content removals on reddit. Proceedings of the ACM on Human-Computer Interaction 3(CSCW): 1–33. DOI: 10.1145/3359294.

34.

Jhaver

Birman

Gilbert

, et al. (2019b) Human-Machine collaboration for content regulation: the case of reddit automoderator. ACM Transactions on Computer-Human Interaction 26(5): 1–35. DOI: 10.1145/3338243.

35.

Jiménez Durán

(2022) The economics of content moderation: theory and experimental evidence from hate speech on twitter. 4044098, SSRN Scholarly Paper, 25 February. Rochester, NY: Social Science Research Network. DOI: 10.2139/ssrn.4044098.

36.

Kempton

(1986) Two theories of home heat control. Cognitive Science 10(1): 75–90.

37.

Kroll

Huey

Barocas

, et al. (2016) Accountable algorithms’(2017). University of Pennsylvania Law Review 165: 633.

38.

Langos

(2012) Cyberbullying: the challenge to define. Cyberpsychology, Behavior, and Social Networking 15(6): 285–289.

39.

Lee

(2018) Understanding perception of algorithmic decisions: fairness, trust, and emotion in response to algorithmic management. Big Data & Society 5(1): 2053951718756684.

40.

Llanso

(2019) Platforms want centralized censorship. That should scare you. Wired, 18 April.

41.

Llansó

(2020) No amount of “AI” in content moderation will solve filtering’s prior-restraint problem. Big Data & Society 7(1): 2053951720920686.

42.

Matias

(2019) Preventing harassment and increasing group participation through social norms in 2,190 online science discussions. Proceedings of the National Academy of Sciences 116(20): 9785–9789. DOI: 10.1073/pnas.1813486116.

43.

Matias

Simko

Reddan

(2020) Study results: reducing the silencing role of harassment in online feminism discussions.

44.

Moss

(2021) Facebook Plans huge $29–34 billion capex spending spree in 2022, will invest in AI, servers, and data centers. Available at: https://www.datacenterdynamics.com/en/news/facebook-plans-huge-29-34-billion-capex-spending-spree-in-2022-will-invest-in-ai-servers-and-data-centers/ (accessed 14 April 2022).

45.

Munger

(2017) Tweetment effects on the tweeted: experimentally reducing racist harassment. Political Behavior 39(3): 629–649. DOI: 10.1007/s11109-016-9373-5.

46.

Myers West

(2018) Censored, suspended, shadowbanned: user interpretations of content moderation on social media platforms. New Media & Society 20(11): 4366–4383. DOI: 10.1177/1461444818773059.

47.

Nadali

Murad

MAA

Sharef

, et al. (2013) A review of cyberbullying detection: An overview. In: 2013 13th International Conference on Intellient Systems Design and Applications, 2013, pp. 325–330. IEEE.

48.

Noble

(2018) Algorithms of Oppression. New York: NYU Press.

49.

OECD (2017) Trust and public policy: how better governance can help rebuild public trust Available at: https://www.oecd-ilibrary.org/governance/trust-and-public-policy_9789264268920-en (accessed 9 August 2021).

50.

Pater

Kim

Mynatt

, et al. (2016) Characterizations of online harassment: Comparing policies across social media platforms. In: Proceedings of the 19th International Conference on Supporting Group Work, 2016, pp. 369–374.

51.

Rader

Cotter

Cho

(2018) Explanations as mechanisms for supporting algorithmic transparency. In: Proceedings of the 2018 CHI conference on human factors in computing systems, 2018, pp. 1–13.

52.

Roberts

(2016) Commercial content moderation: Digital laborers’ dirty work.

53.

Santa Clara Principles (2018) Santa Clara principles on transparency and accountability in content moderation. Available at: https://santaclaraprinciples.org/images/scp-og.png (accessed 9 August 2021).

54.

Seeber

Bittner

Briggs

, et al. (2020) Machines as teammates: A research agenda on AI in team collaboration. Information & Management 57(2): 103174. DOI: 10.1016/j.im.2019.103174.

55.

Seering

Wang

Yoon

, et al. (2019) Moderator engagement and community development in the age of algorithms. New Media & Society 21(7): 1417–1443.

56.

Shin

Park

(2019) Role of fairness, accountability, and transparency in algorithmic affordance. Computers in Human Behavior 98: 277–284.

57.

Sundar

Nass

(2001) Conceptualizing sources in online news. Journal of Communication 51(1): 52–72.

58.

Suzor

West

Quodling

, et al. (2019) What do we mean when we talk about transparency? Toward meaningful transparency in commercial content moderation. International Journal of Communication 13: 18.

59.

Taylor

DiFranzo

Choi

, et al. (2019) Accountability and empathy by design: encouraging bystander intervention to cyberbullying on social Media. Proceedings of the ACM on Human-Computer Interaction 3(CSCW): 1–26.

60.

ter Hoeven

Stohl

Leonardi

, et al. (2021) Assessing organizational information visibility: development and validation of the information visibility scale. Communication Research 48(6): 895–927.

61.

Tyler

(1996) The relationship of the outcome and procedural fairness: how does knowing the outcome influence judgments about the procedure? Social Justice Research 9(4): 311–325. DOI: 10.1007/BF02196988.

62.

Vogels

(2021) The state of online harassment. In: Pew Research Center: Internet, Science & Tech. Available at: https://www.pewresearch.org/internet/2021/01/13/the-state-of-online-harassment/ (accessed 9 August 2021).

63.

Waddell

(2018) What does the crowd think? How online comments and popularity metrics affect news credibility and issue importance. New Media & Society 20(8): 3068–3083. DOI: 10.1177/1461444817742905.

64.

Yang

Stoyanovich

(2017) Measuring fairness in ranked outputs. In: Proceedings of the 29th international conference on scientific and statistical database management, 2017, pp. 1–6.

Shall AI moderators be made visible? Perception of accountability and trust in moderation systems on social media platforms

Abstract

Keywords

Introduction

Literature review

Sources of moderation

Fairness and Trust in the Moderation Decision

Agreement with the Moderation Decision and Likelihood to Look at the Site Content Policy

Belief Certainty in the Moderation Process and Perceived Accountability and Objectivity of the Moderation System

Ambiguity in moderation

Methods

Participants

Experimental design

Procedure

Measures

Results

Perception of moderation decision: fairness and trust

Perception of moderation process: belief certainty

Perception of moderation system: accountability and objectivity

Likelihood to check content policy and agreement with the moderation decision

Discussion

Limitations

Conclusion

Footnotes

Acknowledgments

Declaration of conflicting interests

Funding

ORCID iDs

References