Abstract
In recent years, artificial intelligence has been deployed by online platforms to prevent the upload of allegedly illegal content or to remove unwarranted expressions. These systems are trained to spot objectionable content and to remove it, block it, or filter it out before it is even uploaded. Artificial intelligence filters offer a robust approach to content moderation which is shaping the public sphere. This dramatic shift in norm setting and law enforcement is potentially game-changing for democracy. Artificial intelligence filters carry censorial power, which could bypass traditional checks and balances secured by law. Their opaque and dynamic nature creates barriers to oversight, and conceals critical value choices and tradeoffs. Currently, we lack adequate tools to hold them accountable. This paper seeks to address this gap by introducing an adversarial procedure— – Contesting Algorithms. It proposes to deliberately introduce friction into the dominant removal systems governed by artificial intelligence. Algorithmic content moderation often seeks to optimize a single goal, such as removing copyright-infringing materials or blocking hate speech, while other values in the public interest, such as fair use or free speech, are often neglected. Contesting algorithms introduce an adversarial design which reflects conflicting values, and thereby may offer a check on dominant removal systems. Facilitating an adversarial intervention may promote democratic principles by keeping society in the loop. An adversarial public artificial intelligence system could enhance dynamic transparency, facilitate an alternative public articulation of social values using machine learning systems, and restore societal power to deliberate and determine social tradeoffs.
This article is a part of special theme on The Turn to AI. To see a full list of all articles in this special theme, please click here: https://journals.sagepub.com/page/bds/collections/theturntoai
Introduction
In recent years, machine learning (ML) systems and artificial intelligence (AI) 1 have been deployed by online platforms to prevent the upload of allegedly illegal content or to remove illicit expressions. ML algorithms are trained to spot objectionable content, and to remove it, block it, or filter it out before it is even uploaded (Gollatz et al., 2018). In the wake of the COVID-19 pandemic, Facebook has announced it would send home its human reviewers of content and would increase its “reliance on proactive detection to remove violating content” (Facebook News, 2020). This move to fully automated moderation is not simply a technical transition. AI offers a robust approach to content moderation, which functions at scale.
AI filtering systems are endowed with censorial power which could bypass traditional checks and balances secured by law (Heldt, 2019). ML algorithms take over decision-making power normally assigned to courts and administrative agents (Bridy, 2016; Karaganis and Urban, 2015; Ofcom, 2019). Algorithmic enforcement may unduly restrict the freedom of users to access, experience, transform, and share creative materials, such as scientific publications, cultural assets, and news reports (Tehranian, 2015; Tushnet, 2019). Such access is essential for learning and education, and for developing innovative markets and an informed citizenry. Those interests are protected by legal rights, including freedom of expression and fair use (Geiger and Izyumenko, 2019). The opaque and dynamic nature of ML algorithms make it harder to ensure these rights are adequately protected.
Moreover, ML systems do not simply automate the enforcement of speech restrictions. As further described in this paper, the rise of content moderation by AI is transforming the nature of speech regulation. Speech regulation, in the broad sense, defines the scope of permissible speech through social and legal norms (Klonick, 2018). AI introduces a new type of governance, one based on dynamic and adaptive decision-making processes driven by data, correlations, and predictions. Governing speech through AI involves collecting massive amounts of data and applying data analytics techniques to identify patterns and correlations in order to predict trends and outcomes. Such predictions are followed by automatic filtering or removal of expressions.
Speech regulation by AI is fundamentally different from regulation by norms. It is designed to optimize a single tradeoff defined by the system designer. As such, it often overlooks the interests of all stakeholders and does not necessarily reflect larger societal tradeoffs. Moreover, the opaque, dynamic, and adaptive nature of AI tools create significant barriers to public oversight (Katyal, 2019; Selbst and Barocas, 2018). We currently lack adequate tools to hold these systems accountable (Whittaker et al., 2018). Overall, content moderation by AI reflects the rise of unchecked private power, which may escape traditional checks and balances intended to ensure that power is exercised in the interest of society at large.
This paper takes a “law by design” approach (Mulligan and Bamberger, 2018; Nissenbaum, 2011) to address these gaps. It proposes to mitigate the problems inherent in AI-based content moderation by introducing an adversarial approach which I call
The paper introduces the strategy of contesting algorithms and describes the necessary regulatory measures for promoting its implementation. It proceeds as follows: the next section describes the use of AI to enforce speech restrictions by online platforms. Following that, I describe how the shift from legal norms to enforcement by AI affects the public sphere, and I analyze the ensuing challenges to fundamental democratic principles. Finally, I introduce the strategy of contesting algorithms and analyze the various implications of an adversarial approach. The paper ends with some concluding remarks.
AI in content moderation
Over the past decades, online platforms have become de facto arbiters of speech (Klonick, 2018). As the amount of user-generated content ballooned, early systems of human review were quickly and thoroughly supplanted by automated mechanisms for controlling uploaded content (Sag, 2017; Tehranian, 2015). In this section, I describe the rise of AI in content moderation by platforms as the dominant paradigm for governing speech in the public sphere, and I discuss how the legal regime has shaped the design of these systems.
Private enforcement of speech regulation
Online communication is mediated by a handful of global platforms, such as Facebook, Twitter, and Google, which effectively govern online access to content, to speakers, and to the potential audience for every expression (Klonick, 2018). Social media platforms, which either host content (e.g. Facebook) or facilitate access to content (e.g. Google), may disable access by filtering, removing, or blocking objectionable expressions. As such, platforms offer a natural point of control for monitoring, filtering, blocking, and disabling access to content (Keller, 2019).
This gatekeeping function has made platforms an ideal partner for civil and criminal law enforcement (Bridy, 2010; Zittrain, 2006). Since the late 1990s, different laws have sought to encourage online intermediaries to moderate online content, in exchange for immunity from liability for content posted by users (the so-called “safe harbor” regime). A classic example is the U.S. Digital Millennium Copyright Act (DMCA), which requires platforms to expeditiously remove alleged copyright-infringing materials upon receiving notice from rights holders (17 U.S.C.§ 1201). Similarly, in the European Union, safe harbor provisions under the E-Commerce Directive 2000/31 exempt hosting platforms from liability for content posted by their users, provided that they did not modify that content and were not aware of its illegal character (Angelopoulos, 2016; Husovec, 2017). Once notice is received, the platform is obliged to promptly remove the content.
Newly introduced laws in Europe require a swifter response from online platforms, meaning they must actively engage in content moderation at the outset. For example, the recently introduced Network Enforcement Act (NetzDG) in Germany requires intermediaries to delete content that is evidently unlawful within 24 hours of a complaint being filed (DEUTSCHER BUNDESRAT: DRUCKSACHEN [BR-Drs.] 536/17 (30 June 2017)). Similarly, a recent proposal by the European Commission would require hosting service providers to remove or disable access to terrorist content online within one hour of receipt of a removal order (European Commission, 2018). The recently adopted Article 17 of the Digital Single Market Directive explicitly requires hosting platforms to ensure from the outset the unavailability of infringing content posted by users, in order to avoid strict liability (Article 17(1), European Parliament (26 March 2019)). To comply with such regulations and provide the mandatory swift response, platforms must deploy algorithmic measures, made, in accordance with high industry standards of professional diligence.
Content moderation by AI
With the amount of user-generated content growing exponentially, it has become impossible to rely on human review for content moderation (Ofcom, 2019). Platforms have developed automated content filters and high-speed removal systems to efficiently fulfill their obligations under the safe harbor regime (Karaganis and Urban, 2015: 28–30). The expanding liability of online platforms for potentially harmful content posted by their users is further fostering the deployment of ML algorithms (Gollatz et al., 2018; Keller, 2018: 5–8). These systems are designed to optimize the speedy detection of potentially harmful content. The technical definition of “harmful” content may change depending on the functional purpose of the system—whether to enforce the platform’s contractual standards (e.g. the definition of hate speech in Facebook’s Community Standards) or to ensure compliance with legal requirements (e.g. “incitement to hatred” pursuant to the NetzDG).
ML tools have been deployed in a variety of contexts. Intellectual property enforcement is one example. For instance, Scribd, a subscription-based digital library of e-books and audiobooks, employs a system called BookID to generate a digital fingerprint for each book based on semantic data (e.g. word counts, letter frequency, phrase comparisons). Texts uploaded to Scribd are scanned by BookID, and content which matches any BookID fingerprint is blocked. Similarly, Amazon’s Project Zero uses ML to continuously scan product listing updates, and to proactively remove suspected counterfeits, based on logos, trademarks, and key data provided by its partnering brands (Mehta, 2019).
Some platforms have also developed voluntary measures which exceed their legal obligations under the safe harbor regime, and which open up new business opportunities. YouTube’s Content ID is a classic example. Using a digital identifying code, Content ID can detect and notify rights holders whenever a newly uploaded video matches a work that they own. Rights holders can then choose to block or remove the content, share information, or monetize the content (Perel and Elkin-Koren, 2016; Sag, 2017).
Platforms also make use of ML tools to flag and remove other types of harmful speech, such as hate speech (Kan, 2019) or terrorist propaganda. For instance, the Global Internet Forum to Counter Terrorism, an industry-led non-profit organization, and Tech Against Terrorism, a public–private partnership launched by the United Nations Counter Terrorism Executive Directorate, use hashtags to tackle terrorist propaganda and extremist content. The database of hashtags (unique digital fingerprints of terrorist recruitment videos or violent terrorist imagery) is kept secret. Participating platforms (including Facebook, Twitter, and Microsoft) use AI to filter out terrorist propaganda by detecting images and videos which match this privately held database of content hashes (Heller, 2019; Keller, 2018: 6–7).
Some systems are based on prediction and prevention (Katsh and Rabinovich-Einy, 2017). Under this approach, ML algorithms are trained to spot objectionable content before it is seen by its potential audience. YouTube, for instance, recently announced that it was using AI to identify extremist content (YouTube Enforcement quarterly report, 2018). According to YouTube, more than 83% of the videos it deleted during the surveyed period were flagged by AI (Meyer, 2018), and three-quarters of those were deleted before accruing any views (O’Flaherty, 2018). Using ML, platforms may even seek to predict, and prevent, behaviors which have never occurred, and possibly may never occur, based on real-time data on uploaded content and users. One example is the deployment of ML to monitor live chats and video meta-data so as to predict copyright-infringing live video streams of sports games (Zhang et al., 2018). To prevent live streaming of harmful content, such as footage of a murder or terrorist attack (Dreyfuss, 2017) or self-inflicted harm (Segarra, 2019), platforms must develop the capacity to predict and expeditiously block the distribution of such footage before it becomes viral (Ofcom, 2019). ML is also deployed for “takedown and stay down” purposes, which involves active monitoring of uploads to ensure that objectionable content which has been removed, or any similar content, is not reloaded (Bridy, 2016).
Overall, the use of AI-based measures may be inescapable when it comes to managing massive amounts of user-generated content with both uniformity and particularity (Bamberger, 2010). Yet using these systems to govern speech also raises serious concerns from a social welfare perspective. These implications are discussed in the next section.
The algorithmic public sphere
ML introduces a new way of governing online speech. As the use of ML for online content moderation becomes ubiquitous, it challenges the principles which have traditionally guided the public sphere in democratic societies. Below, I argue that public scrutiny of ML filtering is necessary to sustain democratic principles, and I show why the shift to AI introduces new challenges to public oversight.
AI content moderation and social welfare
Online discourse facilitated by digital platforms constitutes our modern public sphere (Balkin, 2018; Yemini, 2019). As aptly described recently by Justice Kennedy of the Supreme Court of the United States: While in the past there may have been difficulty in identifying the most important places (in a spatial sense) for the exchange of views, today the answer is clear. It is cyberspace—the ‘vast democratic forums of the Internet’ in general, and social media in particular. (Packingham v. North Carolina, 137 S. Ct. 1730, 1737 (2017)).
Here, I focus on a subset of content moderation by platforms, where ML is deployed to filter and remove unwarranted content. There are several reasons why this aspect of content moderation is particularly significant for the common good. First, gaining some knowledge about which expressions have been filtered out from the public sphere is a social interest. The democratic ideal of self-governance by the people assumes access to information and the right of free deliberation by informed citizens, so that they may collectively decide their common destiny (Baynes, 1992; Benhabib, 1993). Without accountability, the removal of expressions from the public sphere may silence some speakers (e.g. social activists, political opponents) and may deprive the public of access to legitimate speech.
Arguably, the public sphere is shaped not only by what is missing but also by what is available (Klonick, 2018). Therefore, arguably, public scrutiny should apply not only to upload filters and automated removal of expressions but also to any failure to remove harmful content that could potentially affect the public welfare. Yet controversial content posted online can become the subject of public debate, whereas the foundations of filtering and removal decisions are not necessarily visible. Critical definitions of what counts as unwarranted content (e.g. copyright infringement, hate speech or terrorist propaganda) are concealed in the ML code (Grimmelmann 2015). Moreover, while the “notice and takedown” (N&TD) regime offers a procedure to challenge the removal of content by platforms (Urban et al., 2017), we currently lack a similar mechanism to contest robust ML filtering of expressions.
Another reason to focus on the removal of content is that filtering and removal of expressions are often performed in response to a legal duty or an informal expectation by governmental authorities (Elkin-Koren and Haber 2016; Kreimer, 2006). The danger here is that public authorities will nudge private businesses to perform law enforcement tasks, thereby bypassing constitutional restraints on the use of power by government to restrict speech. Such unjustifiable content removal may amount to censorship (Citron, 2018; Heldt, 2019). Consequently, in such cases, there is a strong public interest to ensure that filtering complies with the rule of law and that it does not bypass constitutional guarantees to free speech.
Finally, public oversight of ML filtering is also warranted by the free market approach to content moderation (Wu, 2018). The view that content moderation by social media platforms reflects a “marketplace of ideas,” and therefore that platforms should be free to deploy content moderation without any governmental oversight assumes that users are able to distinguish and select the type of content moderation which accommodates their preferences (e.g. conservative platforms or liberal platforms which remove hardly any content). Yet, when filtering by ML removes expressions in non-transparent ways, it fails to offer accurate signals to users, and therefore fails to support functional market competition.
Overall, these considerations expose a strong public interest in ensuring that upload filters and removal systems deployed by private platforms are aligned with the overall societal interest. Moreover, where such systems are deployed in law enforcement, it is necessary to ensure they comply with the rule of law.
There are good reasons to doubt that AI filters would indeed reflect the public interest. First, AI tools available today are context-sensitive and are still far from offering accurate outcomes. For instance, the non-profit Counter Extremism Project has argued that Facebook has failed to track and remove well-known Islamist extremist propaganda. At the same time, it was recently reported that YouTube’s ML system had erroneously deleted over 100,000 human rights videos reporting from Syria about chemical attacks (O’Flaherty, 2018). Moreover, experts have argued that AI cannot make accurate determinations based solely on linguistic analysis of content, since the meaning of any particular content is contextual (Engstrom and Feamster, 2017). We need to understand the context in order to distinguish between an infringing reproduction and an entertaining parody, or between hate speech and legitimate protest. AI tools are currently inferior to human reviewers in considering context. Therefore, it is necessary to ensure that memes, posts, and videos, which are removed or blocked, indeed represent hate speech or otherwise illicit content, rather than legitimate protest against police violence or brutal dictatorships (Bloomberg Editorial Board, 2017). To enhance accuracy, some systems rely on proxies. For instance, systems used for flagging terrorist propaganda also employ geo-location and require human moderation to achieve accurate outcomes. This raises a whole new set of issues regarding fairness, accountability, and the potential biases of such systems.
Another issue is potential conflicts of interest. Platforms use AI to match users and content. Their business model is based on generating data on users and extracting revenues from selling users’ profiles for targeted advertising or other data-driven products and services (Evans and Schmalensee, 2016). The matching of users and content is intended to enhance the amount of data collected on each user, the types of data collected, and the freshness of the data (Plantin et al., 2018). Platforms would thus seek to sustain content, which may increase social media engagement, and only remove content which is likely to compromise that goal. From a societal perspective, however, content removal may involve weighing a wide range of interests, including privacy, individual safety, and national security, as well as free speech, the right to protest, and access to culture and knowledge.
How AI transforms the way we govern speech
The use of ML to identify and remove unwarranted speech is transforming the way we govern the public sphere: from governance by norms to governance by AI.
Legal norms shape speech through explicit rules and principles. Public law defines the boundaries of free speech by offering a definition of what speech is, and by enforcing speech norms. Contractual terms of use, such as Facebook’s Community Standards, also offer definitions of legitimate speech (Klonick 2018; Yemini 2019). The rule of law implies that people are governed by rules: a transparent set of norms reflecting a social contract, which apply equally to all members of society, and which are administered through social institutions. Lawmakers aim to produce laws in order to maximize social welfare by striking an acceptable balance between conflicting goals and interests. Limitations on speech governed by law are similarly defined by explicit norms, reflecting our social choices. Laws are enforced through impartial court judgments, where courts are asked to apply general norms to particular instances, and thereby decide how to achieve an optimal balance between the conflicting interests at hand.
AI, by contrast, introduces a new type of governance. ML filters manifest dynamic and adaptive decision-making processes driven by data. This type of governance involves the collection of massive amounts of data. Data analytics techniques are applied to these vast data stores in order to identify patterns and correlations that can be used to predict trends and outcomes, and subsequently to perform an automated action (e.g. filtering, removing, blocking).
Content moderation can be based on
The governing of content through AI differs from governing speech through norms in several important respects. First, governing speech via AI depends on vast amounts of data and ongoing surveillance, which may threaten privacy. Indeed, the logic of Big Data is based not only on mass surveillance but on an uncompromising demand for full transparency on the part of users without reciprocal transparency on the part of data processors and owners (Zuboff, 2019). As such, the use of AI for governing online speech may also affect equality and human dignity, and lacks sufficient assurances of due process (Burk, 2019; Citron, 2008; Noto La Diega, 2018). I leave discussion of these issues to another day.
Second, ML systems conflate the commercial interests of platforms and the governmental functions they perform. The different functions performed by platforms in content moderation—i.e. the commercial (private) functions and the law enforcement (public) functions—are embedded in the same ML system, making use of the same data and governed by the same features (Elkin-Koren and Perel, 2020).
Consider, for instance, copyright enforcement. Copyright law defines copyright infringement and entitles the owner to enforce the copyright in court. When YouTube’s Content ID automatically flags allegedly infringing materials, it effectively decides which content constitutes an infringement (judicial power) and also acts to remove, disable, or filter such content (executive power). Moreover, Content ID also effectively defines the scope of copyright protection by determining how much similarity between the original and the alleged copy is necessary in order to trigger removal (legislative power). At the same time, the system offers right holders the opportunity to monetize the content while sharing the profits with YouTube (Sag, 2017). This may create a systematic conflict of interest embedded in the design (Balkin 2016). ML algorithms are designed to maximize profits for their facilitators and are therefore likely to be commercially biased in non-transparent ways (Plantin et al., 2018). Platforms are obviously entitled to promote their legitimate business interests. Yet, when platforms perform public functions (e.g. filtering content to enforce the law), or where they otherwise regulate speech which affects the public good, the private interests and the public functions of platforms become inseparable (Elkin-Koren and Perel 2020). The ML algorithm used for (public) law enforcement purposes will be informed by the same labeling of users and content—and will make use of the same application programming interface, learning patterns, and software for matching users and content—used to advance commercial purposes.
Finally, AI conceals the way we decide speech-related norms and social tradeoffs.
ML systems introduce a different type of governance. They do not simply apply norms, but also generate new norms. Content moderation by AI might be designed to achieve an explicit objective (e.g. identifying hate speech), but it may also carry a regulatory consequence: shaping users’ behavior by distinguishing between legitimate and illegitimate expression (where only the latter is removed). Unlike legal norms, which include explicit definitions of illegal content (e.g. infringing materials, violent speech), ML algorithms are designed to identify patterns and make predictions, without having to explicitly reveal the norms being applied. They simply detect, classify, and predict trends and outcomes, which are often followed by automated actions (e.g. filtering, removing, or blocking the content). The classification of any particular content as legitimate or illicit will often depend on predefined features and assigned weights, as well as by the specific set of training data employed, and it is likely to be shaped by new learning over time.
ML-based content moderation is probabilistic (Ananny, 2019). Decisions to ban or remove content may depend on many dynamic variables: whether the content triggered a computational threshold; whether similar content has triggered the system before; whether third parties flagged the content or similar content; and how often these things have occurred. This feature of ML-based content moderation challenges an important principle of the rule of law, which assumes self-governance by autonomous agents who are capable of directing their behavior according to norms.
Moreover, ML systems lack an explicit definition of the tradeoffs they effectively apply, as well as procedures for deciding tradeoffs, and for deliberating and negotiating conflicting interests. These systems are set to optimize a certain function, e.g. removing copyright-infringing materials as defined by right holders, or removing terrorist content based on known hashes. The technical definition of such goals necessarily reflects certain value tradeoffs, manifested by excluding some features or tweaking the system to prefer one outcome over another. Technical definitions of particular features and their appropriate weight effectively define whether a particular image, text, or video is classified as illegitimate speech that is subject to removal. Thus, norms are set through a dynamic aggregation and analysis of previous instances and patterns, while the outcome itself might be inexplicable.
Content moderation always involves tradeoffs, value choices, and setting priorities. What constitutes illicit content, and who is authorized to decide it, reflect a political choice. Legal norms consist of explicit tradeoffs which subject freedom of speech to limitations: to protect private interests (e.g. personal reputations which are protected by defamation law), fundamental rights (e.g. laws prohibiting hate speech), or national security (e.g. espionage laws protecting classified materials). Indeed, the laws of different countries will often strike different tradeoffs between free speech and these other interests (Keller, 2019). Yet, as we shift to governance of speech by ML, we are losing an important opportunity to scrutinize norms and socially negotiate the values tradeoffs they embed.
All in all, AI removal systems are designed to optimize particular objectives, which are shaped by the platforms’ commercial interests. The tradeoffs embedded in these systems may overlook some values in the public interest (e.g. fair use, free speech). The institutional shift in regulatory power from the state to private companies allows crucial tradeoffs to be decided by market players, regardless of their commercial biases. We currently lack sufficient tools to examine what these tradeoffs are and to ensure these systems indeed reflect our social contract (Langvardt, 2018).
Oversight and its limits
ML content moderation may affect social welfare, while lacking compatible checks on the use of power and adequate safeguards against misuse. Studies on the algorithmic implementation of the N&TD regime by platforms have underscored the vulnerability of such systems to misuse, absent appropriate oversight. For example, it has been shown that these systems have been extensively misused by third parties to remove non-infringing materials which have little to do with copyright (Bar-Ziv and Elkin-Koren, 2018; Urban et al., 2017). These findings underscore the vulnerability to fraud and misuse in these nontransparent automated enforcement systems.
Here, I focus on a different challenge: how to ensure that AI removal systems reflect our social contract.
With the rapid advances in AI, and its regulatory power to shape behavior, there is growing awareness of the need to keep society in the loop (Rahwan, 2018). The literature pertaining to governance by AI has proposed transparency and oversight measures to help enhance accountability in ML systems (Kadri and Klonick, 2019; Katyal, 2019). Reform proposals include requiring greater transparency in governmental use of AI (Brauneis and Goodman, 2018) or expanding governmental investigative authorities (U.S. The Algorithmic Accountability Bill, 2019).
These are important measures, but they suffer from major limitations (Selbst and Barocas, 2018). ML algorithms are opaque. Even if the system’s objectives and metrics are explicitly announced, much depends on the implementation of these high-level values in the code. While norms (e.g. features, weights) can be made explicit, the dynamic nature of ML systems means that such transparency is less useful. Outcomes are largely shaped by the training data, and by real-time data collected by operating the system and generating feedback. Sometimes, as in the case of neural networks, the process of extracting patterns from data—and, hence, the link between input data and outcomes—is not even transparent or explicable to the data scientists in charge of the system. Transparency reports and public oversight have generally proven futile in ensuring that these algorithmic regimes advance social welfare (Keller, 2018).
Moreover, algorithms and data are often protected as trade secrets and are inaccessible to the general public—including data regarding the success of the algorithm’s past predictions, which is collected and used to train the algorithm so that it can make more accurate predictions in the future (Wexler, 2018). Consequently, there is no way to tell why one piece of content is deemed legitimate and another subject to removal. As such, it is difficult to detect biases against particular speech or speakers, and subsequently to design tools for addressing such biases. When systems are designed to disable speech before it is even posted, tracking missing content becomes even trickier, creating further barriers to public oversight (Urban et al., 2017).
Another approach to enhance oversight is procedural, introducing a right to appeal an unjustified removal decision. Studies have shown, however, that even where a right to appeal is available (such as a counter notice procedure under the DMCA), in practice it is rarely exploited, and overall has failed to provide effective remedies (Urban et al., 2017: 44).
The opacity of the tradeoffs applied by algorithms makes it difficult to apply to them conventional measures of public and legal oversight. It is also difficult to design legal checks for dynamic learning algorithms, which change constantly, within a wider technological ecosystem where ongoing development is the norm. Algorithmic governance may thus require new types of ongoing legal interventions to enable ongoing negotiation of values and public scrutiny.
Contesting algorithms
On the whole, content moderation by AI may affect social welfare, with no sufficient mechanism for ensuring that such systems reflect our social contract and comply with the rule of law. Several scholars have suggested ways to fix this problem from within, proposing a participatory framework which involves different stakeholders either in the design process (Lee et al., 2019) or in monitoring compliance with social values (Rahwan, 2018). Some proposals seek to shape the outcome of content moderation by holding platforms liable for harmful content (Perry and Zarsky, 2015). Common to these proposals is an attempt to influence the configuration of the dominant content moderation system. Here, I propose a different approach: keeping an ongoing check on the use of AI removal systems by introducing friction. I see the main problem in the current algorithmic content moderation regime as residing in the lack of any adversarial pressure. The system is set to optimize a single outcome, reflecting a predetermined trade-off which lacks any space for deliberation, negotiation and contestation that are essential for governing speech in liberal democracies. The purpose of contesting algorithms is to bridge this gap. The idea is to insert an automated checkpoint prior to removal, reflecting diversified social values, and thereby create a
Below, I introduce the framework of contesting algorithms and lay out its basic principles and advantages.
An adversarial Public AI
An adversarial public system (“
The underlying idea behind contesting algorithms is to counterbalance the single optimization standard of current content removal systems by introducing an adversarial framework. Unlike subversive strategies, which propose to use adversarial code or data to reconfigure dominant ML systems (e.g. Protective Optimization Technologies; Kulynych et al., 2020), the adversarial design proposed here intends to create a check. Subversive tools may facilitate social protest and serve important political goals, but are unlikely to transform the social outcome, as presumably platforms would undertake preventive measures to counter them, leading to a technological arms race.
An adversarial strategy seeks to revive the public interest in content moderation. Consider, for instance, the removal of alleged copyright-infringing materials. Copyright law defines a set of tradeoffs. It seeks to foster the creation of new works of authorship by securing incentives for authors and, at the same time, ensuring the freedom of current and future authors to use existing works. Limitations and exceptions to copyright, such as fair use under U.S. law, or “quotation, criticism, review” and “caricature, parody or pastiche” under European law, serve as a check on copyright, to make sure it does not stifle the very creativity that the law seeks to foster. These limitations may also protect other social values, such as freedom of expression (Geiger and Izyumenko, 2019). In content moderation governed by rules, these conflicting values will be contested in the courts, and tradeoffs will be decided through adjudication. The ML removal system, by contrast, is likely to reflect a monolithic tradeoff. Consider, for instance, a meme posted on Facebook which consists of a copyrighted image (say, a Disney figure) and an original text (say, a humorous text criticizing corporate power). Such a meme may put the interest of the right holder (Disney) to prevent unauthorized use of a copyrighted work, against the free expression interest of the meme creator. When a removal system automatically removes the meme after finding a match to a copyrighted Disney image, it applies a particular tradeoff, which gives priority to the rights of copyright owners.
Under the proposed system, once the platform ML algorithm has classified content as infringing, but before it takes action (e.g. removing the content), the platform will be obliged to test the classification by running the content through the Public AI system. The Public AI system would be designed to reflect the values codified by copyright law, including fair use and free expression. It would embed a quantified definition of legitimate use, reflecting the aggregated opinions of relevant stakeholders who might be affected by the system, even if their interests are not currently reflected in the platform’s removal system (“social interests”). In the copyright example, these stakeholders could consist of legitimate users of content recognized under the current copyright scheme, such as students using materials for self-learning, teachers using works for educational purposes, librarians, documentary film makers, or users engaging in non-profit remixing. The input for designing the model and training data could be based on decided copyright cases pertaining to legitimate use of content (“fair use”). Models could also use parameters identified by legal scholars who have studied a variety of factors in decided cases in order to predict the fair use outcomes (Beebe, 2008; Netanel, 2011; Sag, 2012). Additionally, the model could be informed by observational data, such as analyses of the actual use of copyrighted materials by major stakeholders such as librarians, educators, journalists, or participants in remix communities.
In short, the Public AI algorithm would be used to test whether removal decisions contravene other social interests, such as fair use or free speech. In the absence of such findings, removal of the content will proceed. In case of conflict, a dispute procedure will compare the scoring generated by each system to determine the probability of infringement/fair use. The system may further include explicit (quantified) rules defining which score should prevail, and when the dispute should be referred to human review (see Figure 1).
How should such an adversarial design be promoted? A relatively simple step would be to modify the current safe harbor regime. The law could condition safe harbor upon testing any removal scoring generated by the platform on the Public AI prior to removal. Regulators may also define the scope of social interests to be considered by contesting algorithms, by specifying which stakeholders should be represented or what weight should be given to any particular interest.

The proposed adversarial process.
The virtues of an adversarial strategy
Adversarialism is an essential principle both in computer science and in legal procedure. In computer science, adversarial strategies are deployed to explore the weaknesses of “black box” ML algorithms. Adversarial learning algorithms offer a tool for ongoing monitoring of opaque ML algorithms. For instance, Generative Adversarial Networks make use of an unsupervised ML to automatically identify learning and patterns in the main ML system (Marcks von Würtemberg, 2017).
Similarly, adversarialism is essential to the rule of law (Hildebrandt, 2019). Adversarial legal procedures, where parties are called to present their contesting positions in front of a judge or a jury, are one of the fundamental tenets of common law justice systems and a gold standard of dispute resolution. Adversarial procedures are presumably better equipped to test evidence, determine the truth, and reach sound decisions. Thus, they offer a procedural check on state power in criminal cases or appeals against the misuse of administrative power.
Content moderation by ML currently lacks comparable adversarial mechanisms. Consequently, a system that is designed to optimize a single objective (e.g. removing any materials which match sampled content provided by copyright holders) may overlook a wide range of social interests that might be implicated by this choice (e.g. education, fair use, political parody). The present proposal seeks to address this gap by mandating an adversarial procedure.
An adversarial approach which challenges the monolithic system of content moderation by platforms carries several advantages. Notable among these are facilitating pluralism, ensuring dynamic ongoing oversight, restoring the public interest, and promoting competition and innovation.
Facilitating pluralism
Content moderation involves tradeoffs between conflicting values and interests of various stakeholders. Presumably, if we knew in advance which values should be considered in each and every case of content dispute, and what tradeoffs should apply, we could make these tradeoffs mandatory, requiring platforms to give particular weight to some values. For instance, if we believed that every use for educational purposes necessarily constitutes fair use, we could require platforms to reconfigure their systems to disable any removal of content used in education. Yet, a guiding principle of contesting algorithms is to keep the adversarial system separated from the dominant system of content moderation. Rather than attempting to reconfigure the original system and alter the optimization model, I propose that “legitimate use” should be determined by an independent system.
One reason to prefer an adversarial design over a mandatory tradeoff rule is that often we don’t know what the relevant tradeoffs are. A classic example is the fair use doctrine under American copyright law (Section 107 of the U.S. Copyright Act). This open-ended standard assumes that in a dynamic world of rapidly changing technological and economic circumstances, it is impossible to predict all circumstances of legitimate unauthorized use of copyrighted materials. Therefore, the law intentionally avoids setting explicit tradeoffs. Instead, the law authorizes courts to adjust limitations on copyright by considering four factors (the purpose and character of the use; the nature of the copyrighted work; the amount taken; and the effect on the market) when determining whether the unauthorized use of certain copyrighted material is legitimate (Netanel, 2011).
Another reason to avoid strictly defining tradeoffs ex ante is that even if the relevant tradeoffs can in theory be identified, it may be difficult to reach a social consensus on how to apply them. If we mandate platforms to embed mandatory social tradeoffs, society would be called to determine all tradeoffs ex ante. Governance by rules, by contrast, enables social actors to agree on high-level principles (e.g. the exercise of copyright should be subject to fair use) and work out the details of such tradeoffs by courts applying these principles in particular cases down the road. This enables the legal system to leave space for disagreement. Indeed, pluralist governance assumes that citizens do not always agree on how to balance competing values and interests. Therefore, our democratic governance structure leaves room for disagreement by creating institutions where tradeoffs can be deliberated, negotiated, and decided (elections), and where they are subject to oversight (judicial review).
To preserve pluralism, the adversarial strategy takes a procedural approach. It does not set any particular norm, but instead creates a procedure for contesting competing values in an algorithmic environment. This is a democratic move: we don’t need to reach consensus on the tradeoffs, but instead can agree on a legitimate procedure by which these tradeoffs can be decided. This approach may also bypass constitutional barriers. In some jurisdictions, such as the United States, mandating particular tradeoffs on platforms by law might also be considered a radical intervention in freedom of speech, and as such might be considered unconstitutional (Keller, 2019).
Moreover, the adversarial design may underscore the public/private divide, assist in challenging the legitimacy of private removal choices, and confine the scope of content removal. A monolithic design makes it difficult to clearly identify the political choices which were actually made by the current system, e.g. which features were considered in determining legitimate speech, and the weight given to each. The adversarial design intends to clearly separate the two automated processes of deciding values—private and public. Each system will optimize the objectives it was designed to achieve. Once tradeoffs become explicit they can become the subject of deliberation and public scrutiny.
Dynamic ongoing oversight
Content moderation by AI is not rule-based. As described above, ML content moderation involves a feedback loop, whereby the model is refined based on previous performance. The data is the code. This dynamic method of applying and generating norms might be incompatible with a predefined mandatory tradeoff. For instance, in a hypothetical example of a mandatory tradeoff which gives priority to an educational purpose over copyright limits, the quantified definition of educational use may change over time, reflecting learning acquired by running new data. Unless we trust the platforms to make such determinations, we need a strategy that can contest these outcomes in a dynamic way.
For the reasons discussed above, we cannot trust platforms to overcome their conflicts of interest when training and modifying learning algorithms. Moreover, monitoring compliance with cumbersome regulations such as those dealing with content moderation is likely to be costly and inefficient. The proposed adversarial system could overcome some of these difficulties by introducing a dynamic strategy of ongoing checks.
Restoring the public interest
The dominant removal system embeds a value choice informed by the commercial interests of platforms and their business partners. This choice is further reinforced by its underlying learning mechanisms. In the absence of alternative values and tradeoffs, this system becomes monolithic. Since we cannot trust platforms to reflect the relevant social tradeoffs, we must ensure that there is an up-to-date alternative articulation of those values which are absent from the dominating platform.
Articulating the values and interests of stakeholders which are underrepresented by the dominant AI removal system creates a space for developing a comprehensive public alternative to that system. In this context, algorithmic governance offers new opportunities, since the aggregation of individual models, and the resulting policy operations of different stakeholders, are digitally coded.
Competition and innovation
Incentivizing platforms to run their removal data on a Public AI may not only keep society in the loop but may also generate data necessary for innovation.
Platforms often do not openly share their data. By putting a wall around such data, platforms not only make it difficult to scrutinize their governing functions (Bruns, 2019) but also create barriers to the development of AI systems in the public interest. To effectively identify legitimate uses, contesting algorithms require data on removals of illicit content. Obliging platforms to test allegedly illicit content on the contesting system would provide indispensable data for developing and improving such systems. This, in turn, could facilitate a market of independent systems for identifying legitimate use in additional contexts. For instance, AI systems which identify fair use might be useful in educational institutions that make available large quantities of teaching materials for educational purposes and are required to determine fair use on an ongoing basis (Elkin-Koren, 2017). A competitive market in AI for identifying legitimate use would consequently promote innovation by creating competitive pressures on market players to invest in improving their systems.
Conclusions
The framework of contesting algorithms seeks to tame the power of private platforms to silence speech by re-introducing the social interest into the ML loop. It may offer a practical tool for reducing false positives generated by the dominant removal systems. It does not address false negatives, namely decisions to restore content despite its potential harm to the public interest. For reasons discussed in this paper, these cases raise a different type of challenge, which can be addressed by existing legal and public oversight tools.
Contesting algorithms may also have a political dimension. One could think of content moderation by AI as a robust system of social engineering, which is designed to advance predetermined values. The system model embeds a value choice (e.g. what constitutes illicit speech), and that choice is reinforced by its underlying learning mechanisms. In the absence of alternative values and tradeoffs, this robust system becomes monolithic and threatens our democratic discourse.
Facilitating an adversarial intervention may enable members of society to articulate societal interests, not simply as a critique, but also as a functional means, which interfaces with the dominant infrastructure. This new type of collective action may enhance accountability by creating space for a political action regarding the relevant tradeoffs. Overall, this framework could enhance freedom in an environment that is increasingly governed by AI.
Footnotes
Acknowledgements
I thank Yochai Benkler, Michael Birnhack, Michal Gal, Ellen Goodman, Seda Gürses, Maayan Perel, Hellen Nissenbaum, and Moran Yemini for excellent comments and suggestions. I also thank the participants of TILTing Perspectives 2019 and the research seminars at the Berkman Klein Center for Internet and Society at Harvard University, Cornell Tech Digital Life Initiative, the Weizebaum Institute, and the Edmond Safra Center for Ethics, Tel-Aviv University, for great conversations.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Israel Science Foundation (grant 1820/17).
