Sage Journals: Discover world-class research

Abstract

Contemporary policy debates about managing the enormous volume of online content have taken a renewed focus on upload filtering, automated detection of potentially illegal content, and other “proactive measures”. Often, policymakers and tech industry players invoke artificial intelligence as the solution to complex challenges around online content, promising that AI is a scant few years away from resolving everything from hate speech to harassment to the spread of terrorist propaganda. Missing from these promises, however, is an acknowledgement that proactive identification and automated removal of user-generated content raises problems beyond issues of “accuracy” and overbreadth--problems that will not be solved with more sophisticated AI. In this commentary, I discuss how the technical realities of content filtering stack up against the protections for freedom of expression in international human rights law. As policymakers and companies around the world turn to AI for communications governance, it is crucial that we recall why legal protections for speech have included presumptions against prior censorship, and consider carefully how proactive content moderation will fundamentally re-shape the relationship between rules, people, and their speech.

Keywords

human rights artificial intelligence free expression filtering machine learning natural language processing

This article is a part of special theme on The Turn to AI. To see a full list of all articles in this special theme, please click here: https://journals.sagepub.com/page/bds/collections/theturntoai

One of the defining characteristics of online communication is its massive scale: Service providers that host user-generated content quickly face such an enormous volume of material that it is impossible for human beings to review the content before it is posted.¹ It can even be difficult for humans to manage the task of reviewing only content that has been flagged or reported (Weber and Seetharaman, 2017).

To respond to this unimaginable scale of content, many hosts have developed reactive content moderation systems. Reactive systems wait for a user (or a court order) to bring content to the host’s attention, at which point the host evaluates it according to the host’s content standards. Reactive moderation systems, however, evaluate a very small amount of the content on a given site (Suzor, 2018).² They depend on users’ willingness to flag content and may not be effective at curtailing certain kinds of abuse, such as illicit content posted in private groups.

Thus, many hosts have also explored using proactive methods to detect potentially problematic content. Proactive methods can include manual techniques (holding posts in a queue for human evaluation prior to publication) as well as automated systems that use filters or that evaluate network-level signals such as IP address and posting behavior in order to preemptively block spam and malware (Ramachandran and Feamster, 2006).

Many of today’s law, policy, and technical conversations about using (so-called) artificial intelligence (AI) in content moderation discuss two different but related concepts: proactive detection of content and automated evaluation of that content. AI tools might be used in the detection of potentially problematic posts to reduce the reliance on users (or courts) to flag content for review. AI tools might also be used to enforce the decision about whether a post violates the host’s content policy (or the law) by automatically removing or demoting content. One assumption underlying these conversations is that more sophisticated “AI” tools will avoid the over-blocking and easy circumvention that characterized simpler approaches to filtering.

Large social media platforms face substantial pressure over their content moderation systems from public authorities, advertisers, and their users, and are increasingly relying on automated and proactive methods for detecting and evaluating user content. While this trend is understandable from a practical perspective, the widespread use of proactive detection and automated evaluation in content moderation represents a significant shift in how we have conceived of speech regulation—and protection—under the international human rights framework. Moreover, many of these problems stem from the basic concept of filtering itself; advancements in the technology of filtering will not address all of the underlying human rights concerns.

Upload filters and other “proactive measures”

The concepts of proactive detection and automated evaluation are invoked (and often conflated) by a variety of terms and phrases in policy discussions: “monitoring obligations” (European Parliament and Council, 2000: article 15), “notice and staydown” (Bailey, 2017), “upload filtering” (European Digital Rights, 2019), “automatic detection and removal of content” (European Council, 2017: para. 2), and the generic “proactive measures” (European Commission, 2018: article 6). Each of these phrases essentially refers to the concept of filtering, or screening uploaded content to determine whether it contains prohibited elements.

Filters can employ a variety of technical approaches, including relatively simple techniques such as keyword filtering and hash matching. In keyword filtering, the text of a post is scanned to identify words or phrases that have been blacklisted. In hash matching, an algorithm is applied to a file to generate a unique digital fingerprint or “hash” of that file. Every new upload is also hashed and compared to that fingerprint. Hash matching has been used in systems aimed at blocking repeated uploads of the same content, including in cases of potential copyright infringement (YouTube, 2019) and child sexual abuse imagery (Microsoft, 2019). Depending on the moderation system, content that triggers a keyword or hash filter may be automatically blocked from upload, sent to a human for review of the file’s context (e.g., in a news broadcast or as part of a transformative fair use), uploaded with the offending content removed, or monetized by a third party (e.g., in cases of alleged unlicensed use of a copyrighted work).

Both keyword filtering and hash matching require the entity applying the filter to know what it wants to block in advance, i.e. the list of keywords or reference files to hash. These types of filters are also easier to circumvent, because altering the content (by changing the spelling of a word, cropping an image, or adding a caption) (Fishman, 2019) can defeat the filter’s ability to identify a match. As pressure grows for content hosts to police more complex kinds of speech, there is increasing interest in using more sophisticated machine-learning techniques, such as natural language processing and object recognition, to try to identify novel examples of prohibited content-types (Llansó et al., 2020).³

Proactive content moderation enacts prior restraints on speech

Regardless of the specific technical approach, applying a filter to all user-provided content at the moment of upload bears many of the characteristics of the legal concept of prior restraint on speech. The international human rights framework, as articulated in the Universal Declaration of Human Rights (United Nations General Assembly, 1948) and implemented in treaties such as the International Covenant on Civil and Political Rights (United Nations General Assembly, 1966) (ICCPR), is a set of treaties, laws and principles that recognize the fundamental rights and freedoms enjoyed by all human beings and that lay out states’ obligations to protect and promote those rights. Article 19 of the ICCPR guarantees

the right to freedom of expression; this right shall include freedom to seek, receive and impart information and ideas of all kinds, regardless of frontiers, either orally, in writing or in print, in the form of art, or through any other media of his choice.

Limitations on the right to freedom of expression must be provided by law, pursuant to a legitimate aim, and must be necessary and proportionate measures to achieve that aim (United Nations Human Rights Committee, 2011).

Prior restraint, or prior censorship, refers to circumstances where a speaker must seek approval from some empowered third party (typically, a public official) before she is allowed to speak or to publish her views. There is a strong presumption against the validity of prior censorship in international human rights law and in the case law interpreting national constitutional protections for freedom of expression (Bantam Books v. Sullivan, 1963: 70; Lanza, 2017; United Nations Human Rights Committee, 2011). As the Supreme Court of Mexico has noted (Lanza, 2017: 43), this presumption is partly guided by “the need to guarantee that, in principle, no persons, groups, ideas, or means of expression are excluded a priori from public discourse.” Rather than endure proactive filtering of their speech, people must be free to speak and then to face subsequent punishment, after speaking, for any law or rule they have violated.⁴

A variety of rationales support the presumption against prior censorship in human rights law, including concerns that: (a) systems of administrative prior censorship bring too much speech into the scope of government review, (b) when censorship becomes convenient it becomes too common, and (c) such systems are too shielded from public scrutiny about what the rules governing speech are and how they are being applied (Emerson, 1970: 506). The following sections discuss the ways in which proactive content moderation can raise all of these same concerns.

Exposing more speech to evaluation and approval

Most people rarely encounter government scrutiny of their offline speech. The overwhelming majority of people will never face criminal charges or civil claims on the basis of statements they have made. They will never have the standards articulated in law applied directly to their own speech. This is not necessarily because people do not say things that could violate the law; people will utter threats and incite violence before most violent encounters, or share slanderous statements as gossip, or express opinions that amount to illegal hate speech. But these statements are fleeting, often expressed to limited audiences, and typically not overheard by anyone likely to report the speaker to law enforcement. “Perfect” enforcement of laws regulating speech has never been possible offline.

First Amendment scholar Jack Balkin (2014) describes one of the characteristics of systems of prior restraint as “Deliberate Overbreadth of Coverage: … [P]rior restraints subject a much greater breadth and variety of content to government scrutiny and surveillance than a system of subsequent prosecution and punishment.” Filters inherently exert this kind of expanded scrutiny: Whether imposed by government mandate or corporate practice, filters treat every image, video, or statement as a potential rule violation. Moderation systems that use filters may block a flagged file or may (at best) require that the post be pre-approved by a human moderator.

Online, previously transient statements take on a more durable status and can be viewed by a potentially worldwide audience. Moreover, there are multiple technical intermediaries that facilitate every communication, each of whom can potentially scrutinize that speech (Kaye, 2015). This alone presents a substantial shift in how we experience our right to freedom of expression. Conversations held face-to-face are protected, practically and often legally, from surveillance by public or private actors. Online, these same conversations are copied, processed, and stored by multiple intermediaries. If these intermediaries apply filters to all of this content, then each of these conversations may be subject to multiple layers of a third party’s pre-approval, using techniques that are “deeply intrusive of users’ right to privacy and freedom of expression” (ARTICLE 19, 2016: 16).

Removing procedural hurdles to censorship

Procedural safeguards in our legal systems, such as requirements for an independent arbiter, the opportunity for the speaker to defend himself pre-punishment, and the opportunity to appeal a decision, protect freedom of expression both directly (by ensuring that the law is applied fairly and that errors are corrected) and indirectly (by making it somewhat burdensome to prosecute a case). Procedural safeguards introduce friction into our systems for adjudicating the illegality of speech to ensure that it takes more than “a stroke of a pen” to suppress speech (Emerson, 1970: 506). This notion of intentional friction is fundamentally in tension with the push for upload filtering and other automated content moderation designed to ensure that more speech is blocked or removed more quickly.

When we lose this friction, error and overbreadth can distort the environment for speech.⁵ Keyword and hash-matching filters are vulnerable to overbroad application of whatever rule they seek to impose, as they do not take context into account and may make mistakes in identifying matches (Engstrom and Feamster, 2017).⁶ Filters that leverage machine learning, such as natural language processing tools, also can have a variety of limitations that lead to overbroad restrictions on speech, including under-representation of certain speakers in the training data, failure to develop a clear and consistent definition of prohibited content, and limited utility of tools outside the specific context in which they were trained (Duarte et al., 2017).

Approaches to addressing error in machine-learning systems, moreover, are fundamentally different from due process protections aimed at ensuring a just result. Machine-learning tools can be evaluated on their “accuracy,” but “accuracy” in this sense typically refers to the rate at which the tool’s evaluation of content matches a human’s evaluation of the same content. This kind of analysis does not address whether the human evaluation of the content is correct—“accuracy”, in this context, does not reflect assessment of ground truth. Efforts to improve filtering tools may focus on bringing mistakes to within an “acceptable” range without considering the impact on different speakers or improving the relation to ground truth.

Human rights law, however, requires more than just optimizing a speech-regulation system for a small quantity of error; it requires individualized determinations by independent arbiters (see Nunziato, 2014). When speech issues are adjudicated on a case-by-case basis, rather than through a quickly administered system of prior censorship or filtering, people are able to make the argument to an independent adjudicator that the governing standard has been misapplied in their case (or that the standard is so vulnerable to mis-application that it should be struck down entirely). Ideally, this happens before the consequence of the decision is meted out, but at the very least speakers must be able to appeal restrictions on their freedom of expression to an independent arbiter (Yildirim v. Turkey, 2012).

Filtering can be implemented in a variety of ways, only some of which enable procedural safeguards. Some systems are set up to identify content for human review, which enables a human to confirm or reject the evaluation made by the automated tool. But these systems are vulnerable to being overwhelmed, as in the high-profile example of YouTube circumventing its own human-review systems in the immediate aftermath of the Christchurch shooting, when copies of the shooter’s video were uploaded with extreme frequency (Dwoskin and Timberg, 2019). Other moderation systems are designed from the outset to automatically enforce the upload filter’s decisions; even if appeals are available, the decision has been made and the consequences applied before the person affected has the opportunity to contest the overbroad application of the rule (Klonick, 2018). Error in a system that automates enforcement means that speakers are being punished before they have the chance to appeal.

Reducing scrutiny of systems of censorship

A third characteristic of systems of prior restraint is their ability to function as what Balkin (2014) terms “Low-Visibility Systems of Control … [that] can operate in the background, outside of public scrutiny.” Systems of prior restraint exist in tension with the requirement in international human rights law that limitations on expression be “provided by law” which is “formulated with sufficient precision to enable an individual to regulate his or her conduct accordingly” (United Nations Human Rights Committee, 2011). In other words, people have the right to know what rules will be applied to their speech.

Filtering can significantly interfere with people’s ability to perceive, understand, and scrutinize the systems of censorship that are shaping their online information environment. Content moderation systems can apply upload filters in non-transparent ways. One example of this is the shared hash database for alleged terrorist propaganda that was created by Facebook, YouTube, Microsoft, and Twitter (Fishman, 2019). Participating companies contribute hashes of what they consider to be terrorist propaganda and can decide to use these hashes in their own content moderation systems. Crucially, none of the participating companies discloses to users when their content has been blocked because it matched something in the hash database. Most users are unaware that the database exists and there are no procedures for appealing the inclusion of an image or video in this system. Users are left unable even to perceive the action of this element of the content moderation system that is governing their speech on these platforms.

Filtering can also exacerbate the problem of vague rules or standards. As noted above, international human rights law requires that a reasonable person be able to understand the rules that govern their speech, so that they can choose to modify their behavior and avoid punishment. Many content policies, however, are difficult to understand; users are often unsure of why certain speech is or is not deemed a violation of a stated rule. This is partly due to the challenge of enforcing a rule at scale, which inevitably leads to apparent inconsistencies that cloud people’s understanding of the underlying rule. But part of the challenge also lies in the task that a content policy is trying to accomplish: Content policies are often more restrictive than what governments can censor under international human rights standards, and they are applied to a multi-faceted userbase that comes from an enormous variety of cultural and linguistic contexts. So content policies are at once trying to be more fine-grained about their prohibitions than a law, and trying to regulate the speech of many more different kinds of people than any existing law.

The use of machine-learning tools can pose a particular barrier to people’s ability to scrutinize these systems, if they are not designed in ways that enable transparency or explanation. For example, when a machine-learning program develops a classifier to distinguish between two types of statements (e.g., hate speech and non-hate speech), the features it uses may not be translatable to concepts that humans would understand (Zhang et al., 2018). If the features can be described in understandable terms, they may not map cleanly to semantic concepts related to the relevant content policy (Wang, 2018). A hate-speech detection tool using word embeddings, for example, may distinguish between hate speech and non-hate speech due to the syntax and vocabulary used in hateful expression in the training data, rather than an estimation of, e.g. whether a statement is intended to incite discrimination or violence. There are a variety of ways to approach explainability of algorithms, including transparency into what aspects or elements of the underlying data were particularly important in developing the classifier, and methods for auditing the effects of an algorithm (Gilpin et al., 2019). If we do not emphasize the need for explainability in the technologies of content moderation systems, we will lose the ability to understand, much less direct or improve, the systems that govern our communications.

Conclusion

In 2011, the four International Mechanisms for Promoting Freedom of Expression issued the Joint Declaration of Freedom of Expression and the Internet, which included the statement: “Content filtering systems which are imposed by a government or commercial service provider and which are not end-user controlled are a form of prior censorship and are not justifiable as a restriction on freedom of expression.”⁷ These experts in human rights law recognized then the significant threat to free expression posed by filtering, and while technologies to automatically detect and evaluate content have grown more sophisticated over the years, those technical advances do not address many of the core problems of filtering. Filtering acts as a prior restraint on speech, regardless of the “accuracy” of the tool being used.

Content hosts that deploy filters must recognize the human rights risks inherent in their content moderation systems and work to mitigate them. And governments—in democracies and authoritarian regimes alike—must recall the presumption against prior censorship and refrain from imposing communications governance systems that are incompatible with their obligations under human rights law. Regardless of advances in machine learning, filtering mandates are a threat to freedom of expression.

Footnotes

Acknowledgment

Thank you to my colleagues Liz Woolery and Natasha Duarte for their comments and feedback on this piece.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Emma J Llansó

Notes

References

ARTICLE 19 (2016) Freedom of expression unfiltered: How blocking and filtering affect free speech. Report, December. https://www.article19.org/data/files/medialibrary/38586/Blocking_and_filtering_final.pdf (accessed 29 February 2020).

Bailey

(2017) The internet archive pushes back on “notice and staydown” in recent comments to the copyright office. Available at: https://blog.archive.org/2017/02/23/the-internet-archive-pushes-back-on-notice-and-staydown-in-recent-comments-to-the-copyright-office/ (accessed 29 February 2020).

Balkin

(2014) Old-school/new-school speech regulation. Harvard Law Review 127: 2296.

Bantam Books v. Sullivan (1963) 372 U.S. 58.

Duarte

Llansó

Loup

(2017) Mixed messages: The limits of automated social media content analysis. Report, Center for Democracy & Technology, USA, November.

Dwoskin

Timberg

(2019) Inside YouTube’s struggles to shut down video of the New Zealand shooting – And the humans who outsmarted its systems. Washington Post, 18 March.

Emerson

(1970) The System of Free Expression. New York: Vintage Books.

Engstrom

Feamster

(2017) The limits of filtering: A look at the functionality and shortcomings of content detection tools. Report, Engine, March.

European Commission (2018) Proposal for a Regulation of the European Parliament and of the Council on preventing the dissemination of terrorist content online.

10.

European Council (2017) European Council meeting (22 and 23 July) – Conclusions. EUCO 8/17.

11.

European Digital Rights (EDRi) (2019) Copyright directive: Upload filters strike back. Protecting Digital Freedom. Available at: https://edri.org/copyright-directive-upload-filters-strike-back/ (accessed 29 February 2020).

12.

European Parliament and Council (2000) On certain legal aspects of information society services, in particular electronic commerce, in the Internal Market. Directive 2000/31/EC.

13.

Fishman

(2019) Crossroads: Counterterrorism and the internet. Texas National Security Review 2(2): 83–100.

14.

Gilpin

, et al. (2019) Explaining explanations: An overview of interpretability of machine learning. arXiv:1806.00069v3 [cs.AI].

15.

Guynn

(2019) Facebook while black: Users call it getting ‘Zucked,’ say talking about racism is censored as hate speech. USA Today, 24 April.

16.

International Mechanisms for Promoting Freedom of Expression (2011) Joint declaration on freedom of expression and the internet. Available at: https://www.osce.org/fom/78309?download=true (accessed 29 February 2020).

17.

Kaye

(2015) Report of the special rapporteur on the promotion and protection of the right to freedom of opinion and expression. United Nations Human Rights Council A/HRC/29/32.

18.

Klonick

(2018) The new governors: The people, rules, and processes governing online speech. Harvard Law Review 131: 1598–1670.

19.

Lanza

(2017) National Case Law on Freedom of Expression. Washington, DC: Inter-American Commission on Human Rights.

20.

La Rue

(2011) Report of the special rapporteur on the promotion and protection of the right to freedom of opinion and expression. United Nations Human Rights Council A/66/290.

21.

Llansó

Hoboken

Leersen

, et al. (2020) Artificial intelligence, content moderation, and freedom of expression. Trans-Atlantic Working Group Working Papers Series. Available at: https://www.ivir.nl/twg/publications-transatlantic-working-group/ (accessed 29 February 2020).

22.

Microsoft (2019) PhotoDNA. Available at: https://www.microsoft.com/en-us/photodna (accessed 29 February 2020).

23.

Nunziato

(2014) The beginning of the end of internet freedom. Georgetown Journal of International Law 45: 383.

24.

Ramachandran

Feamster

(2006) Understanding the network-level behavior of spammer. In: SIGCOMM ‘06 proceedings of the 2006 conference on applications, technologies, architectures, and protocols for computer communications (ed L Rizzo), Pisa, Italy, 11–15 September 2006, pp.291–302. New York: ACM.

25.

Suzor

(2018) What proportion of social media content gets moderated, and why? Digital Social Contract. Available at: https://digitalsocialcontract.net/what-proportion-of-social-media-posts-get-moderated-and-why-db54bf8b2d4a (accessed: 29 February 2020).

26.

Twitter (2019) Twitter for business. November. Available at: https://web.archive.org/web/20191115051906/https://business.twitter.com/ (accessed 29 February 2020).

27.

United Nations General Assembly (1948) Universal declaration of human rights, Res. 217A (III). Available at: https://www.ohchr.org/EN/UDHR/Pages/UDHRIndex.aspx (accessed 13 April 2020).

28.

United Nations General Assembly (1966) International covenant on civil and political rights, Res. 2200A (XXI). Available at: https://www.ohchr.org/en/professionalinterest/pages/ccpr.aspx (accessed 13 April 2020).

29.

United Nations Human Rights Committee (2011) General comment 34: Article 19: Freedoms of opinion and expression. CCPR/C/GC/34. Available at: https://www2.ohchr.org/english/bodies/hrc/docs/GC34.pdf (accessed 13 April 2020).

30.

Wang

(2018) Interpreting neural network hate speech classifiers. In: Proceedings of the second workshop on abusive language online (ALW2) (ed D Fišer, et al.), Brussels, Belgium, 31 October, pp.86–92. Brussels: Association for Computational Linguistics.

31.

Weber

Seetharaman

(2017) The worst job in technology: Staring at human depravity to keep it off Facebook. The Wall Street Journal, 27 December.

32.

Yildirim v. Turkey (2012) (No. 3111/10), European Court of Human Rights.

33.

YouTube (2019) How content ID works. Available at: https://support.google.com/youtube/answer/2797370 (accessed 29 February 2020).

34.

Zhang

Robinson

Tepper

(2018) Detecting hate speech on twitter using a convolution-GRU based deep neural network. In: ESWC 2018: The semantic web (ed A Gangemi, et al.), Heraklion, Greece, 3–7 June, pp.745–760. Basel: Springer Nature.