Sage Journals: Discover world-class research

Abstract

As digital platforms are increasingly shaping access to information, their control over platform data has profound implications for research, innovation, and accountability. This article presents the first comprehensive empirical study of how digital platforms have used contractual terms to enclose publicly available data over the past decade. Analyzing the Terms of Use of 279 platforms from 2012 and 2022, we identify a systematic increase in both restrictive and permissive clauses governing access to data for research purposes. We find that platforms adopted a dual contractual strategy: restrictive terms limit third-party access to data for research, while permissive terms preserve the platform's own ability to use and share data with selected partners. This strategy emerged alongside the rise of consent-based privacy regulation, especially the EU's General Data Protection Regulation (GDPR), which, despite its objective of protecting users, inadvertently expands platforms’ discretion over data access. Our findings suggest that privacy regulation has facilitated, rather than curtailed, platform data enclosures—enabling platforms to assert de facto control over data in the public domain. We contribute to the literature on data governance, platform regulation, and contract theory by documenting how platforms adapt boilerplate contracts in response to regulatory change and leverage privacy obligations to restrict independent research. These findings carry significant implications for legal and policy debates surrounding data access, transparency, and the future of platform accountability. They underscore the need to regulate the privatization of publicly available data to ensure equitable access for research serving the public interest.

Keywords

Digital platforms platform data access to data data enclosures unauthorized access to data DSA GDPR CFAA terms of service privacy policies terms of use data-driven research data scraping automatic data collection right to research

Introduction

On 25 March 2024, the District Court of the Northern District of California rejected X Corp.'s claim against the non-profit Center for Countering Digital Hate. X alleged that the Center had breached X Corp.'s Terms of Service (ToS) by collecting public tweets, seeking to document an increase in hate speech on X's site since its purchase by Elon Musk. Similar attempts to stifle platform accountability research have been more successful. In 2019, for example, Facebook (Meta) halted a disinformation study conducted by New York University's (NYU) Cybersecurity for Democracy (Vincent, 2021) by sending a cease and decease letter claiming that the study team's use of a third-party application to collect data hosted on Meta's server violated its ToS (Chiauzzi and Wicks, 2019).

As platform data takes on a more central role in research and development, legislators, courts, and scholars have become increasingly concerned about the constraints on data access for scientific purposes, particularly when access to publicly available data is restricted by contractual barriers imposed by digital platforms (Fiesler et al., 2020; Huq, 2021; Kadri, 2021; Mancosu and Vegetti, 2020; Rubinfeld and Gal, 2017). In the EU, the Digital Services Act (DSA) introduced a new procedure for obtaining access to data for specific research purposes. Similarly, the proposed Platform Accountability and Transparency Act (PATA) in the United States would require large digital platforms to allow data access for research purposes. Meanwhile, in the last nearly three decades of litigation, US courts have voiced multiple, sometimes conflicting opinions on the enforceability of contractual terms restricting data access (Solove and Hartzog, 2025; Perrino, 2023).

To date, only one study, published in 2020, has investigated platforms’ use of their Terms of Use (ToU) (including their ToS, privacy policies, community guidelines, and similar end-user agreements) to restrict access to data (Fiesler et al., 2020). This article contributes to this literature by providing new empirical evidence on contractual restrictions affecting access to data for research. It examines how data protection regulations intersect with the drafting of platforms’ ToU and how this dynamic shapes the scope of permissible access to platform data for research purposes. Analyzing the ToU of 279 digital platforms, we identify 12 types of contractual clauses related to data access. We assess their prevalence and practical implications, and demonstrate how platforms invoke privacy obligations to block third-party data collection and create “data enclosures,” which, in effect, restrict access to data.

This article is organized as follows: the “Background” section presents the background for our analysis, clarifying that, from a legal perspective, platform data largely remains in the public domain and should, in principle, be freely accessible for researchers to collect and utilize. Nevertheless, platforms exploit the interaction between their ToU and data protection regulations to effectively enclose publicly available data. The “Data and methodology” section outlines our research methodology. Our findings, presented in the “Results” section, demonstrate the increasing prevalence of ToU restrictions on third-party access to data, even as platforms increasingly rely on their ToU to secure broad rights to use the data as they see fit. The “Discussion” section discusses these findings and their implications. Specifically, it describes the unintended consequences of data protection regulation; that is, platforms’ misusing users’ rights to promote their own business interests. Employing a restrictive strategy in their ToS and community guidelines to limit researchers’ access to data, platforms can avoid unwanted scrutiny and maintain their competitive edge. At the same time, platforms apply a facilitating strategy in their privacy policies, enabling themselves, selected affiliates, and paying customers to use the data for platform-approved research that aligns with the platforms’ business interests.

Background

Platform data scraping and its academic significance

Lawyers and social scientists view data differently. From a legal perspective, data is commonly understood as recorded “facts about a situation, person, [or] event” (Cambridge Dictionary)—discrete “units of information” governed by privacy and property laws (Woods, 2016). Privacy laws focus on whether data reveals “information relating to an identified or identifiable natural person” (GDPR, Art. 4), that is, whether it is “personal” (Viljoen, 2021). Intellectual property laws treat data as an intangible economic asset, raising questions of ownership and permissible use (Determann, 2018). From an economic perspective, data is inherently non-rivalrous: its use by one person does not deprive others of the opportunity to use it. Therefore, data does not entail the usual justifications for assigning property rights.

In contrast, social scientists increasingly view data not as mere facts, but as constructed artifacts shaped by context and human choice, such as platform design. Data is produced—a context-dependent representation of reality that is open to interpretation (Kitchin, 2014; Marres, 2020; Rogers, 2021). Recently, legal scholarship has started to embrace these insights, viewing data as a form of power—an instrument of control underpinning decision-making and market dominance (Fahey, 2021). This view emphasizes data's constructed nature and the impact of “data bias,” highlighting the importance of paying careful attention to how data is collected and used (Zarsky, 2014).

This article focuses on legal barriers to accessing publicly available platform data—what Richard Rogers calls “native” digital data, such as posts, links, hashtags, and images shared by users on digital platforms (Rogers, 2021). The latter may include social media websites (e.g. Facebook and Twitter/X), as well as digital marketplaces (e.g. Amazon and eBay), video-gaming sites (e.g. Rumble), image-sharing platforms (e.g. Pinterest), and social bookmarking services (e.g. Reddit and Plurk).

Although data is non-rivalrous and, in principle, easy to duplicate, digital platforms control access to unique and unmatched data sets (Rubinfeld and Gal, 2017). Publicly available platform data is especially valuable for research. As Noortje Marres notes:

[Platforms] have rendered social life reportable, interpretable, shareable, and influenceable in potentially new ways. […] The very same digital transformations that have made available new types of social data and enabled the application of new computational methods in social research are equally affecting the role of locatedness, embodiment, latency, atmosphere, and so on in social life, in short, its contextual—or […] “situational”—character. (Marres, 2020: 2)

Consistent with Rob Kitchin’s 2014 characterization of big data, platform data offers vast volume, velocity, variety, and veracity (the “four Vs”)—real-time and historical data (including metadata) about the preferences, actions, opinions, and concerns of users, advertisers, and political and social actors worldwide. Platform data spans diverse ethnic and socioeconomic groups, including “vulnerable populations that are traditionally difficult to reach” (Nebeker et al., 2020: 2). Few databases can compete with its scale and richness.

Platform data has thus become a “living lab” of invaluable real-time data, providing researchers with opportunities to reach new frontiers of knowledge and understanding, and challenging existing research methodologies (Rogers, 2021). It serves a wide array of research needs across various disciplines: Computer scientists use it to train natural language processing (NLP) and artificial intelligence (AI) models (Perriam et al., 2019), while health researchers deploy it to analyze mental health and substance abuse patterns (Nebeker et al., 2020). Platform accountability research leverages platform data to investigate such critical issues as political ad targeting (Horwitz, 2020), social polarization (Israeli and Tsur, 2022), discrimination (Edelman et al., 2017; Laouénan and Rathelot, 2022), and platform bias (Patro et al., 2022).

In the social sciences and the humanities, Rogers (2021) and Marres (2020) use platform data to develop innovative methodologies that blend quantitative and qualitative approaches to create nuanced “metapictures” and “situational analytics” that reveal the complex dynamics of digital social environments. These and other critical perspectives (Gitelman, 2013) emphasizing data's contextual and situational character, highlight researchers’ need for independent access to platform data, rather than platform-mediated access, such as via Facebook Open Research and Transparency (FORT).

Scraping publicly available platform data: The legal landscape

Notwithstanding the importance of platform data for research and development in the era of AI and LLMs, digital platforms employ various self-governed contractual and self-imposed technological measures to keep publicly available data under their exclusive control (Schwalbach and Mauer, 2025). They do so precisely because they lack a clear basis for claiming any proprietary rights in it.

To begin with, trade secret laws are largely irrelevant to publicly available platform data due to the lack of secrecy in this regard (18 U.S.C. § 1839).

Additionally, copyright laws explicitly exclude data from their protectable subject matter (Determann, 2018). With data considered a building block for future innovative and creative works, granting rights in it would contradict the primary objective of intellectual property (IP) legislation of promoting scientific progress and innovation (Determann, 2018). Only compilations of data exhibiting sufficient originality in their selection and arrangement may qualify for copyright protection (Determann, 2018; Huq, 2021). Such protection, however, does not extend to the underlying facts (17 U.S.C. § 102(b)). Moreover, even when platform data is sufficiently original to qualify for copyright protection (notably certain user-generated content), copyright laws include exceptions and limitations that permit the use of such content for socially beneficial purposes, including research (17 U.S.C. § 107).

]Finally, as recently explained by the District Court of the Northern District of California in X v. Bright Data , platforms typically obtain only a non-exclusive license to their users’ content, primarily to limit their legal liability for harm caused by user-generated content (UGC). Consequently, since platforms cannot assert rights in excess of those that they actually acquire, they may not lawfully exclude third parties from accessing such content.

Absent IP protection, access to publicly available platform data largely remains subject to two legal frameworks: privacy law and contracts. Publicly available data that includes personal information may be subject to privacy and data protection laws (Solove and Hartzog, 2025). Even though data subjects voluntarily published it on a publicly accessible website, they did not necessarily relinquish their privacy rights to that data. In Carpenter v. United States , for instance, the Supreme Court held that there is a reasonable expectation of privacy under the Fourth Amendment for geolocation data about publicly observable automobile movement. Accordingly, although privacy laws have no direct bearing on questions of ownership of (or rights to) data (Determann, 2018; but see Lessing, 1999; Westin, 1967), they do offer a parallel, competing, framework pertaining to the access and use of personal data (Solove and Hartzog, 2025).

Over the past decade, the collection and use of platform data have been subjected to a heavy regulatory burden. In March 2012, the US Federal Trade Commission (FTC) published its recommendations for protecting users’ privacy in an era of rapid change (FTC, 2012). The FTC recommended that “companies […] limit data collection to that which is consistent with the context of a particular transaction or the consumer's relationship with the business, or as required or specifically authorized by law” (FTC, 2012: 27, 42). Especially noteworthy in this regard is the aftermath of the Cambridge Analytica Scandal, first reported on 17 March 2018, involving the exploitation of personal data gleaned from around 50 million Facebook profiles, originally collected for research purposes, by a political consulting firm that was hired to bolster the Donald Trump's 2016 presidential campaign (Cadwalladr and Graham-Harrison, 2018; Rosenberg et al., 2018). The resulting public outcry placed personal data protection and user consent at the heart of regulators’ agenda. Most notably, Europe's General Data Protection Regulation (GDPR), which came into effect on 25 May 2018, imposes specific requirements on data controllers, such as digital platforms, to protect the interest of data subjects in their personal data.

The GDPR allows personal data to be processed only if permissible under one of six lawful bases enumerated by the regulation (including processing necessary for a contract, legal obligation, or legitimate interests). The section most relevant to the use of platform data for research, the GDPR's Article 13, requires data controllers to inform data subjects of how their data will be used, for how long it will be stored, and how users can exercise their data-related rights. By extension, these terms create a consent-based regime for access to and use of personal information. Article 13 also mandates that any transfer of data to non-EU countries should be subject to an agreement between transferor and transferee that incorporates the privacy protections of EU legislation, allowing data subjects to enforce those rights as a third-party beneficiary, and requires data controllers to disclose the identity of any third party that may handle the data. Article 28 further subjects any transfer of data from controllers to processors to a written contract that not only ensures controllers’ power and discretion over processors’ activities and any onward transfer of data to sub-processors, but also requires controllers to verify processors’ compliance with the GDPR.

Regarding the platform transparency and accountability obligation, which may support platform data research, Article 5 of the GDPR requires data controllers to meet transparency obligations and adopt appropriate technical and organizational measures that would enable them to demonstrate compliance with the law.

Finally, the GDPR provides data subjects a right of claim for harms caused by platforms’ non-compliance with the GDPR, as well as the right to withdraw their consent to data processing. GDPR violations may also result in administrative fines of up to four percent of the violating platform's annual revenues.

In addition to privacy laws, publicly available data is governed by self-imposed measures namely, technological barriers and contractual restrictions that define the scope of permissible data access (Elkin-Koren et al., 2025). Specifically, platforms regularly deploy various contractual limitations to restrict researchers’ access to platform data via their ToS, privacy policies, or community guidelines (Fiesler et al., 2020).

Next, we explain how these two legal frameworks—privacy law and contrct law—could be exploited to create data enclosures.

Prior research and research hypothesis and significance

Although platforms have minimal legal claims to the publicly available data hosted on their servers, they possess strong commercial and reputational incentives to restrict data access. Exercising exclusive control provides platforms with competitive advantages over data-deprived newcomers (Rubinfeld and Gal, 2017), enables selective sharing with partners and paying customers (Huq, 2021), and shields platforms from the regulatory scrutiny that may result from platform accountability research (Kadri, 2022). According to the literature, these commercial and reputational interests likely influence how platforms craft their ToU.

This study explores the contractual restrictions on access to data imposed by digital platforms. It builds on two complementary bodies of research that support the hypotheses detailed below.

The first examines platforms’ use of contractual terms to restrict data access. While multiple studies have analyzed platforms’ contracts (Karanicolas, 2021; Pałka, 2025; Suzor, 2018; Wiśniewska and Pałka, 2023), only one prior study—Fiesler et al. (2020)—has specifically investigated how platforms employ their ToU to restrict data access and create what we term “data enclosures.” Analyzing the ToS of 117 social media platforms as they appeared in November 2017, Fiesler et al. identified four types of restrictive terms: three that directly prohibit data collection (manual, automatic, or any type of collection) and one requiring users’ permission prior to data collection. They found that only 25 platforms (21%) included none of these four types of restrictive terms.

Our study extends this foundational work in three critical ways. First, we examine a broader range of platforms beyond social media sites, including digital marketplaces, video-gaming sites, image-sharing platforms, and social-bookmarking services. Second, we significantly expand the analytical framework, identifying 12 distinct types of access-to-data terms rather than Fiesler et al.'s four categories. Third, our temporal scope captures the period between 2012 and 2022—a period marked by pivotal developments in data governance, including the Cambridge Analytica scandal and the implementation of the GDPR, enabling us to trace the evolution of platform ToU and assess how significant regulatory and ethical developments influenced platform drafting practices.

The second body of research on which this study builds examines how regulatory changes shape contract drafting practices, providing further theoretical support for our hypothesis that platforms adapt their ToU in response to evolving legal and ethical environments. Contracts serve as flexible mechanisms for governing market relationships. This is particularly true of boilerplate contracts that are drafted unilaterally, offered on a “take it or leave it” basis, and typically grant drafters discretion to unilaterally modify terms (Furth-Matzkin and Sommers, 2020; Sales, 1953; Wilkinson-Ryan, 2014). Research demonstrates that digital platforms frequently amend their ToU (Elkin-Koren et al., 2022), and that they do so in response to shifts in the applicable legal and ethical landscape (Choi and Gulati, 2004, 2006), changes in party characteristics and relationships (Marotta-Wurgler and Taylor, 2013), and evolving market dynamics (Bar-Gill and Davis, 2010; Becher and Benoliel, 2021; Ben-Shahar and Pottow, 2006; Davis and Marotta-Wurgler, 2020; Taylor, 2011).

Drawing on these two bodies of literature, we expect platforms to strategically adapt their ToU in response to relevant changes in their legal and regulatory environment. This expectation is particularly well-founded given the evolution of privacy law during our study period. As we have seen, the GDPR and analogous US privacy legislation—including the Fair Credit Reporting Act (FCRA) and the Health Insurance Portability and Accountability Act (HIPAA)—establish strict consent-based frameworks that restrict the use of personal data. These regulatory developments create opportunities for platforms to leverage the interplay between contractual terms and privacy obligations to justify restrictive data access policies.

Building on these theoretical foundations, we advance the following hypotheses: (1) digital platforms make extensive use of contractual mechanisms to enclose the data they host; (2) between 2012 and 2022 platforms significantly revised the data access terms in their ToU to align with evolving legal and ethical standards; and (3) these changes were strategically designed to serve platforms’ commercial and reputational interests by leveraging privacy clauses to restrict third-party access to publicly available data while preserving their own discretionary right to use and share such data.

Our study makes four distinct contributions to the literature and practice of data access. First, we provide the first comprehensive empirical analysis of how platforms systematically adapted their ToU in response to major regulatory changes between 2012 and 2022. This temporal analysis reveals how platforms strategically exploit regulatory developments to advance their interests while restricting third-party research access.

Second, we offer policymakers empirical evidence of regulatory gaps and their unintended consequences (Dari-Mattiacci and Marotta-Wurgler 2022). Our findings demonstrate how platforms have leveraged privacy regulations designed to protect users to create data enclosures that serve corporate rather than public interests (Kadri, 2022). This evidence is particularly timely given ongoing regulatory efforts, such as the EU's Digital Service Act Article 40 and the proposed US Platform Accountability and Transparency Act, which seek to mandate research access to platform data.

Third, we contribute to contract theory by documenting how platforms engage in “drafting in the shadow of regulation” (Mnookin and Kornhauser, 1979); that is, actively exploiting regulatory frameworks rather than merely complying with them (Hwang and Jennejohn, 2022). Our findings suggest that courts should interpret platforms’ ToU within their regulatory context rather than as neutral contractual arrangements between equal parties, as suggested by recent decisions like X Corp. v. Bright Data .

Fourth and finally, the risk of legal liability is driving researchers to seek out new scraping techniques (Dryer and Stockton, 2013; Schwalbach and Mauer, 2025), such as the counter-archiving technique used to study Facebook after it had shut down researchers’ access to public data through its application programing interface (API) (Ben-David, 2020) and the API-bypassing methodology based on a screen scraping routine of public Facebook posts (Mancosu and Vegetti, 2020). Our study enables researchers to further develop methodologies that account for these legal constraints while pursuing accountability research and other socially beneficial investigations. In turn, it provides practical guidance for navigating the evolving contractual landscape of platform data access (Perriam, et al., 2020).

Data and methodology

Our study investigates the contractual frameworks constructed by platforms to control access to and the use of platform data for non-commercial research purposes, as reflected in digital platforms’ ToU.

To compose a thorough and inclusive list of digital platforms, we followed Fiesler et al. (2020) by first including all platforms in Wikipedia's 2022 lists related to digital platforms; namely, the lists of social-networking services, online marketplaces, online video platforms, image-sharing websites, and social-bookmarking sites (a total of 333 digital platforms). After excluding platforms that were no longer operational, platforms whose ToU were inaccessible via the Way-Back Machine (see below) or their websites, and platforms irrelevant to our research context (e.g. airline ticket companies), our list comprised 279 platforms.

For each platform, we analyzed the ToU effective at the data collection point (1 July 2022) and, using the Way-Back Machine, the ToU effective as of 1 July 2012. For platforms that have both a mobile (app) and a web-based interface, we only analyzed the web-based ToU. We selected 1 July 2012, because 2012 is the year in which the European Commission introduced the GDPR, which significantly altered the ethical, legal, and regulatory landscape applicable to personal data protection). Specifically, and as discussed in the “Background” section, the GDPR's Articles 13 and 28 create a regime for the collection, use, and transfer of publicly available platform data that is based on user consent.

The ToU were collected manually. For platforms that had not yet been established in 2012, or that did not post the relevant contracts on their website, we used the earliest available version of the contract. If a platform had blocked the Way-Back Machine from accessing its databases, we used past versions of its contracts posted on its websites, where available. Finally, for non-English ToU, we used Google Translate to read and compare the legal documents.

The comparison between the 2012 and 2022 versions of platforms’ ToU focused on searching for any additions to, changes in, or deletions from the language of terms implicating access to data. To facilitate the process of encoding, we conducted a preliminary reading of the ToU of four platforms (Facebook, LinkedIn, Google, and Yelp) to identify 12 types of access to data-related terms. We then distinguished between terms that restrict access to data for research and those that facilitate such access, even if at the platforms’ sole discretion, as such facilitating terms could be interpreted in a manner that is favorable to researchers.¹ We also distinguished between terms that have a direct effect on access to data and those that have an indirect effect.

In total, we identified eight terms that directly or indirectly restrict access to data and four that facilitate such access. Four types of terms directly restrict access to data: No automatic data collection, prohibiting the use of automated scripts or web crawlers; no data collection, restricting both automatic and manual data collection; no surveys, preventing researchers from “stor[ing], …, any answers, responses, comments or opinions…”; and users’ explicit consent terms, prohibiting researchers’ from collecting data without first receiving each user's explicit consent.

Four other types of terms place indirect restrictions on researchers’ ability to collect platform data: one account—real name or identity terms oblige users to “use the same name that you use in everyday life,” restricting researchers’ attempts to collect data under a concealed identity²; no overriding security terms, which ban researchers from bypassing the anti-scraping technological measures deployed by platforms; no unauthorized access terms, which prohibit access that is not otherwise authorized by the platform; and no overburdening system terms, which may impose an indirect prohibition on the automatic collection of data, to the extent that such collection would “disable, overburden, or impair the proper working or appearance” of the platform website or its products (Table 1).³

Table 1.

Terms restricting third-party access-to-data for research.

Direct restrictions	Indirect restrictions
No automatic data collection	One account—real name/identity, and so on
No data collection	No overriding security
No surveys	No unauthorized access
Users’ explicit consent required to collect data	No overburdening system

We also identified four types of terms that may facilitate researchers’ data access. Two terms directly support the collection of data for research: Platform research activities terms, which allow platforms to use the data to conduct research for purposes set by the platform, such as DataForGood; and academic research terms, which permit platforms to share data with researchers, usually at the platforms’ discretion and subject to their terms. Finally, two other terms indirectly support access to platform data for academic purposes: cross-platform data sharing terms, which permit platforms to share data among affiliated platforms; and cross-border data transfer terms, which allow data to be shared across national borders (Table 2).

Table 2.

Terms facilitating access-to-data for platforms research.

Access to data for research purposes	Access to data (general)
Platform research activities	Cross-platform data sharing
Academic research	Cross-border data transfer

After collecting all platforms’ ToU in both their 2022 and 2012 versions, we provided our team with detailed guidelines for identifying and categorizing each type of term to ensure consistency across all reviewers. Our team then manually read each contract to create a spreadsheet detailing whether and which of the 12 access-related terms appeared in each platform contract's 2012 version, its 2022 version, both, or neither.

We recorded each time any one of the 12 types of terms appeared in any particular contract, including in different contracts from the same platform, as well as the number of platforms that use any one of the terms in one (or more) of their ToU, in both 2012 and 2022. We then assessed whether the change in the language between the two versions was substantial or minor. Changes in the scope of the restriction were deemed substantial, while wording-only changes with no relevant legal effect were considered minor. To increase accuracy, we employed a text comparison tool for each section of the contract, allowing us to visualize the changes between the two versions. Our findings were documented in an Excel spreadsheet, indicating for each platform in the sample whether any one of its contracts contained each of the 12 terms (assigned the number 1) or not (assigned the number 0), as depicted in the spreadsheet extract shown in Figure 1.

Figure 1.

Excel spreadsheet extract.

Results

Access to data for scientific purposes: Main findings

We begin by providing an overview of the current state of access to platform data for non-commercial research purposes. We found no terms of relevant legal implications that did not fit within the 12 terms identified in our initial readings of the four platforms. Our findings support our main hypothesis that digital platforms extensively use contractual measures to enclose the data hosted on their servers. Of the 279 platforms reviewed, 208 (74.6%) have contractual terms that directly restrict access to data by explicitly prohibiting the collection of data, the automatic collection of data, or both. Most platforms also incorporate one or more terms that indirectly restrict access to data, placing contractual limitations on unauthorized access to data (59%), the overriding of the platform's technological security measures (57%), and the overburdening of the platform's computers (53%).

Of the 71 platforms (25.4%) that do not explicitly restrict access to data, 47 (16.8%) employ one or more contractual terms that indirectly achieve a similar end, requiring researchers to gain each user's explicit consent before accessing and collecting data; restricting unauthorized access, the overriding of security measures, or the overburdening of the platform's computers; or limiting users to only one account and obliging them to state their real name when using the platform. Thus, only a small minority of platforms (8.6%) place no direct or indirect contractual restrictions on the collection of data for research purposes.

The platforms that do not contractually restrict access to data have little in common. They include sites such as Readgeek, a book recommendation platform that may have little commercial or reputational interest in restricting access to data, as well as platforms that might be expected to restrict access to data for research, such as Gentlemint, sometimes described as “Pinterest for Men” (Knapp, 2012), and Gaysir, which, as their website explains, is a Norwegian social network aimed at the LGBTQIA + community.

Alongside the widespread use of contractual terms to restrict access to data for research, platforms also commonly include provisions that indirectly facilitate their own research efforts and enable data sharing with selected partners. Notably, 226 platforms (81.9%) include cross-platform data sharing terms, and 158 platforms (57.2%) employ cross-border data transfer terms to satisfy the GDPR's regulatory restrictions on transferring data from EU member countries to non-EU states.

Platform research activities terms, which directly enable platforms to use data for their own research, are also standard and found in nearly two-thirds of platforms’ ToU (63.8%). A minority of platforms (14.1%) also include academic research terms in their ToU. While these terms ostensibly permit external researchers to use platform data, in practice, they subject such use to the platform's discretion, both regarding the research topic and the identity of the researchers.

Finally, our findings also reveal a division of labor in platforms’ use of the different user agreements (Suzor, 2018). To assess this result, we assigned −12.5 points to a platform every time that one of the eight types of terms that directly or indirectly restrict access to data was included in a particular contract, and a score of +25 to each time that one of the four terms that directly or indirectly support access to data was included in their ToU. Thus, a contract that included all 12 terms received a score of 0; a contract that included only and all the terms that support access to data received a score of 100; and a contract that incorporated only and all the terms restricting access to data received a score of −100. As depicted in Figure 2, we found that while ToS had an average score of −36.1, privacy policies’ average score was +46.5. This, we suggest, demonstrates platforms’ use of different contracts to speak in various voices. Specifically, while platforms use their ToS agreements to enclose data, employing contractual measures to deny others’ access to and use of the data hosted on their servers, they use their privacy policies to carve out exceptions to the regulatory and contractual restrictions placed on the use of data. In turn, this ensures that only they, their affiliates, and their chosen (paying) customers are able to access and use the enclosed data as they see fit.

Figure 2.

Facilitating scores.

Changes over time: The evolution of terms controlling access to platform data

We used two types of comparisons to analyze the evolution of contractual terms controlling access to platform data for research purposes. First, at the contractual level, we compared the frequency with which platforms’ ToU include terms that restrict (or permit) access to data for academic purposes in July 2022 and in July 2012. Overall, we examined 567 contracts. Second, at the platform level, we compared the number of platforms that included access-restricting (or -permitting) terms in at least one of their contracts in 2012 and 2022. Overall, this analysis included 264 platforms.

The findings of our first analysis show that there was a relatively slight, but statistically significant, increase in the use of all types of access-restricting terms in platforms’ ToU between 2012 and 2022. In particular, we found a 22% increase in the number of one account—real name terms in platforms’ ToU and a 23% increase in the number of times that no unauthorized access terms appeared in those contracts.

Much more substantial was the increase in terms that may facilitate platform data research. The largest increase occurred in academic research terms, which provide platforms with discretionary powers to share data with third parties for research purposes, and whose frequency nearly doubled over the period, from 22 platforms in 2012 to 41 platforms in 2022. The second and third largest increases, of 79% and 61%, respectively, occurred in terms permitting cross-border data transfer and platform research activities. Finally, we found a significant increase (23%) in the use of cross-platform data sharing terms, which indirectly facilitate platform data research. The complete analysis of the changes in all of the terms examined is depicted in Figure 3.

Figure 3.

Cross-contract comparison.

The findings of our second analysis show a similar increase in the number of platforms, as opposed to the number of contracts, which include terms that (directly or indirectly) facilitate access to platform data for research purposes in at least one of their contracts. The largest increase was in the use of academic research terms. Between 2012 and 2022, the number of platforms that included terms allowing the use of data for academic research increased by 90% from 20 to 38. Meaningful increases were also recorded in the number of platforms that include cross-border data transfer terms (84%), platform research activities terms (57%), and cross-platform data sharing terms (24%).

We additionally found statistically significant, but more modest, increases in the number of platforms that restrict access to data for research. In particular, between 2012 and 2022, the number of platforms that included a no unauthorized access term in at least one of their ToU increased by 20%; the use of one account—real name terms increased by 17%; and no overriding security terms by 16%. The complete analysis is detailed in Figure 4.

Figure 4.

Cross-platform comparison.

The above findings suggest that, alongside a significant increase in terms that directly or indirectly support platform data research, there was a parallel, smaller, increase in terms that restrict platform data research. The findings, however, should be viewed against the base level of terms facilitating or restricting access to data for academic purposes. For example, while the percentage-point increase in the number of platforms that included academic research terms between 2012 and 2022 (86%) was far greater than the percentage-point increase for no unauthorized access terms (20%), there are far fewer platforms that include the former term in their 2022 ToU (38) compared to the latter term (155).

Furthermore, recall that there were twice as many types of terms that restrict access to platform data (eight) than support it (four). Accordingly, and as already mentioned, to explore the impact of changes in platforms’ drafting practices over the 10-year period, we examined the degree to which each type of contract enabled or restricted platform data collection by assigning +25 points to each term that facilitated access to platform data and −12.5 points to each term that restricted such access.

Our findings, depicted in Figure 1, show that the tendency of both community guidelines and ToS to restrict platform data research increased between 2012 and 2022: Community guidelines received scores of −14 in 2012 and −17.1 in 2022, while ToS received scores of −33.7 in 2012 and −36.1 in 2022. This suggests that platforms’ ToS, which predominantly define the bilateral relationship between platforms and their users (including researchers), have taken a more restrictive stance in relation to platform data collection.

In contrast, privacy policies are becoming more supportive of platform data collection, with a score of 29.5 in 2012 and 46.5 in 2022. This means that platforms’ privacy policies, which unilaterally define what data controllers may or may not do with their users’ personal data, have become substantially more supportive of platform data collection over time. Taken together, these findings support our second and third hypotheses, that platforms deploy their ToS to enclose data, while at the same time using their privacy policies to define the scope of permissible data collection to advance their own interests, and that these practices have proliferated over time. As we explain in the following subsection, although the design of our experiment cannot conclusively support a causal link, the correlation between our findings and the changes we describe in platforms’ regulatory environment suggests that these results may represent an unintended, albeit anticipated, consequence of privacy regulations, such as the GDPR. Its consent-oriented approach expands and strengthens platforms’ ability to control access to data via private ordering.

Discussion

Contractual response to regulatory changes

When viewed as a whole, our findings demonstrate how and why digital platforms, as profit-driven private entities, structure their contractual relationships with their users to simultaneously minimize legal risks and maximize opportunities to monetize and leverage data for business purposes. As we discuss below, platforms’ ToU have been designed to maintain control over third-party access to data while safeguarding the platforms' right to use that data, regardless of the regulatory developments.

In particular, our findings illuminate the possible influence that major regulatory and ethical changes can exert on platforms’ contracts.⁴ We identified significant changes to platforms’ ToU governing data access between July 2012 and July 2022. While we cannot establish a definitive causal relationship, these changes correspond with the growing emphasis on personal data protection. As we hypothesized, and as we elaborate further below, the regulatory shift appears to have prompted platforms to redefine the scope of contractually permissible access to platform data for research purposes, facilitating exclusive control over publicly available data.

First, the emergence of consent-based regulatory requirements governing the use of personal data between 2012 and 2022 helps explain the proliferation of academic research and platform research activities terms in platforms’ ToU. The growing prevalence of these terms reflects platforms’ efforts to ensure that their data practices comply with evolving consent-based regulation, preserving their ability to share data with selected research partners in ways that align with their business interests.

At first glance, these terms appear to support the use of platform data for research and, in turn, potentially enhance platforms’ reputation and foster public trust. However, because the GDPR makes data sharing conditional upon users’ consent (in the absence of another basis for processing personal data), it grants platforms broad discretion to determine who may access platform data and for what purpose. Platforms have invoked user privacy claims in litigation against scrapers—such as X v. Bright Data—to justify restricting access to their data. Yet, in this case, the court was not persuaded. In rejecting X's claim that its ToU were designed to protect user privacy, Judge Alsup held that “X Corp., however, is not looking to protect X users’ privacy. It contends that ‘improper scraping … interferes with X Corp.'s own sale of its data through a tiered subscription service’ […], [yet] X Corp. is happy to allow the extraction and copying of X users’ content so long as it gets paid.”

Second, and relatedly, the introduction of new restrictions on the transfer of data outside the European Union may explain the significant increase we have identified in the use of cross-border data transfer and cross-platform data sharing terms. Although both terms indirectly support platform data research by breaking down regulatory barriers, platforms may use the discretion these terms usually provide to permit platform-supervised research that aligns with their commercial interests, while denying independent researchers the opportunity to use the data to advance the public interest in academic research in general, and for platform accountability in particular.

Third, the consent-based framework of the GDPR alongside the public outcry following the Cambridge Analytica scandal may further account for the increase in terms restricting access to platform data for research. This possible relation further underscores the interaction between newly enacted consent-based privacy regulations and the simultaneous increase in terms that both limit third-party data access and permit platform-approved research. For example, the significant rise in the use of no unauthorized access terms may be another unintended, albeit anticipated, consequence of the privacy regulations enacted since 2012, which created the strict data use regulatory framework described earlier. The same rationale accounts for the increased frequency of terms that directly restrict access to data for research purposes, that is, no automatic data collection, users’ consent, no data collection, and no surveys terms.

Fourth and finally, the increased frequency of one account—real name, no overriding security, and no overburdening systems terms similarly demonstrates how platforms exploit their regulatory obligations to protect users’ privacy, in order to strengthen their control over the flow of data and its uses (Fiesler et al., 2020). This provides empirical support to previous research, suggesting that platforms piggy-back on privacy regulations to advance their interests by enclosing the data hosted on their servers (Davis and Marotta-Wurgler, 2019; Van Loo, 2022).

Our findings, therefore, suggest that changes in privacy regulation interact on a profound level with platforms’ drafting practice, resulting in a double-drafting strategy being employed by platforms in their ToU. This consists of a restrictive strategy, which sets limits on access to platform data by third parties, and a facilitative strategy, which grants platforms a contractual right to utilize users’ data in their own (commercial) research and subjects any noncommercial research—for example, research that is conducted using Meta's FORT API—to the platform's sole discretion.

Platform data enclosure via contracts

In his seminal work, The Second Enclosure Movement and the Construction of the Public Domain, James Boyle described the expansion of intellectual property in the digital era with reference to a useful analogy to the historical enclosure of common lands by landowners in England, which entailed publicly accessible land being fenced off and privatized for exclusive use (Boyle, 2003). Our findings regarding platforms’ ToU drafting practices and their attempt to create their own data regime take Boyle's claim a step further, demonstrating how contractual, rather than property, rights can rival, conflict with, and even override the public ordering of access to data, transforming data from a public resource into a privatized one.

The enclosure of data by platforms results from a combination of two types of contractual strategies made evident by our findings. The facilitating strategy ensures that platforms are authorized to employ users’ data for research and development purposes in any way they see fit. This affirmative contractual right is then supplemented by platforms’ restrictive strategy, which employs contractual terms to limit the collection and use of data by others, thus creating new barriers to accessing data that would otherwise be publicly accessible.

Contracts apply to their contractual parties. Yet, platforms often draft broad provisions, seeking to subject anyone who accesses the platform's website to their ToU, regardless of whether they have an active account. Such provisions intend to contractually bind even random website visitors, who do not have a user account, to the platform's ToU. In turn, this subjects any activity by registered and non-registered users alike, including researchers, to non-negotiable contractual terms unilaterally drafted by the platform.

Importantly, platforms often justify the imposition of their contractual terms on non-registered scrapers by reference to their legal obligations and reputational interest in protecting their users’ privacy. In a series of recent decisions, however, the District Court of the Northern District of California has begun to unravel the interplay between privacy regulation and contractual access-to-data restrictions. In Meta v. BrightData , Meta argued that its contractual restrictions on scraping apply to both logged-on and logged-out users and behaviors, implying that any “unauthorized” scrapper necessarily violates Facebook's and Instagram's Terms. The court was not convinced. Applying established canons of contract construction, it held that the ToU apply only to logged-in scrapers, not logged-out ones, stating that “Meta surely understands the difference between defeating anti-automated scraping and piercing privacy walls.”

In another decision by the same court, X v. Bright Data , the court addressed the interplay between the two types of contract-drafting strategies. Pursuant to X Corp.'s ToU, users own their content while platforms are granted a non-exclusive right to make use of the data. Nevertheless, and despite having no exclusive right in the data, X Corp. seeks to prevent the scraping of data by any third parties, even if they are no longer a party to the contract. However, “[A] non-exclusive licensee,” the court held, “has no more than a privilege that protects him from a claim of infringement,” and “because such a licensee has been granted rights only vis-à-vis the licensor, not vis-à-vis the world, he or she has no legal right to exclude others. […] Yet, that is exactly what X Corp. seeks to do with its claims based on the scraping and selling of data—to exclude others from using, copying, reproducing, processing, adapting, modifying, publishing, transmitting, displaying, and distributing X users’ content” ( X v. Bright Data : 849). Consequently, although X Corp. acquires only a non-exclusive right from its users to use the content hosted on its platform, by using contractual terms to prohibit any type of scraping, it “ostensibly acquires” an exclusive right in data via its ToU, “upend[ing] the careful balance Congress struck between what copyright owners own and do not own, and what they leave for others to draw on” ( X v. Bright Data : 852), and thus shrinking the public domain.

The court in X v. Bright Data observed that such contractual practices create a “massive regime of adhesive terms” that effectively grants platforms de facto property rights in data and “fundamentally alter[s] the rights and privileges of the world at large” ( X v. Bright Data : 850). Our findings demonstrate that these contractual practices are not limited to X Corp. alone, but are much more prevalent.

The combination of data-related laws and regulations and the two contractual strategies demonstrated in our findings allows platforms to use their ToU to exert significant power and control over how data hosted or generated on their sites is shared and used outside the platform. If enforceable, these ToU would provide platforms with legal cover to override the legislative data regime, replace it with their own, and provide themselves with a property-like right in platform data. As we have seen, some courts have demonstrated their unwillingness to allow platforms’ enlistment of contract law to achieve this end. Overall, however, the enforceability of contractual provisions that may conflict with copyright law has been highly contested, and courts in the United States have routinely approved and enforced such contractual overriding of copyright norms in numerous cases, despite mounting criticism by academics (Rub, 2017).

Conclusion

Platform data is a critical resource for research and innovation in the public interest. From a proprietary perspective, a major portion of publicly available platform data is in the public domain and thus free for all to use. Platforms nonetheless restrict data access, distorting the delicate balance between exclusivity in proprietary goods and access to public goods.

This article provided empirical evidence supporting the prevalence of data enclosures. It showed that platforms draft their ToU to preserve their de facto control over the data hosted on their servers, while restricting access to and use of the same data by third parties. Our findings further identified a correlation between platforms’ legal obligations and their contract drafting strategies. Specifically, platforms use their data protection regulatory duties under the GDPR to employ a double-side drafting strategy, restricting the use of publicly available platform data by third parties (including researchers), while maintaining their own privilege to use that vary same data.

Such data enclosures generate data monopolies and thwart public benefit research and innovation. Legal policy should therefore strive to prevent them. One possible strategy is to adopt top-down regulatory measures that revoke the discretion of platforms to limit researchers’ access to and use of publicly available data. Article 40(12) of the EU Digital Service Act (DSA, 2022), for instance, mandates very large online platforms and search engines to provide researchers studying systemic risks in the European Union access to real-time data that is publicly accessible in their online interface. Likewise, the proposed US PATA (Perrino, 2023), if enacted, would shield researchers and journalists who collect publicly available data from digital platforms to conduct research on matters of public concern from legal liability.

In addition, courts can develop bottom-up, case-by-case exceptions and limitations to contractual terms seeking to restrict access to publicly available data for public benefit purposes (e.g. academic research). Recently, for example, a Court in Cologne, Germany, found that Meta's use of such data to train its AI model did not violate the GDPR, despite Meta never obtaining users’ consent (Geseley, 2025). Likewise, in the United States, the court of the Northern District of California rejected X Corp.'s claim that Bright Data's scraping of the publicly available data posted by X Corp.'s users constituted a breach of X Corp.'s ToS, rejecting what it described as the platform's attempt to use its ToS to assert property-like rights in publicly available platform data. Eventually, a combination of top-down and bottom-up legal strategies may prevent data enclosures and ensure that publicly available data is accessible for academic and similar research conducted for public benefit.

Footnotes

ORCID iDs

Niva Elkin-Koren

Maayan Perel

Ohad Somech

Ethical considerations

There are no human participants in this article and informed consent is not required.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Author contribution

Equal contribution.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Israel Science Foundation (grant number 1870/21).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability

Subject to appropriate ethical and legal considerations, authors are willing to share the research data in a relevant data repository.

Notes

References

Bar-Gill

Davis

(2010) Empty promises. California Law Review 84(1): 1–45.

Becher

Benoliel

(2021) Sneak in contracts. Georgia Law Review 55: 657–729.

Ben-David

(2020) Counter-archiving Facebook. European Journal of Communication 35(3): 249–264.

Ben-Shahar

Pottow

JAE

(2006) On the stickiness of default rules. Florida State University Law Review 33: 651–682.

Boyle

(2003) The second enclosure movement and the construction of the public domain. Law and Contemporary Problems 66: 33–74.

Cadwalladr

Graham-Harrison

(2018) Revealed: 50 million Facebook profiles harvested for Cambridge Analytica in major data breach. Available at: https://www.theguardian.com/news/2018/mar/17/cambridge-analytica-facebook-influence-us-election (accessed October 27, 2024).

Carpenter

. United States, 585 U.S. 296 (2018)

Chiauzzi

Wicks

(2019) Digital trespass: Ethical and terms-of-use violations by researchers accessing data from an online patient community. Journal of Medical Internet Research 21(2): e11985.

Choi

Gulati

(2004) Innovation in boilerplate contracts: An empirical examination of sovereign bonds. Emory Law Review 53: 930–996.

10.

Choi

Gulati

(2006) An empirical study of securities disclosure practice. Tulane Law Review 80: 1023–1108.

11.

Council Regulation . 2022/2065 of the European Parliament and of the Council of 19 October 2022 on a Single Market for Digital Services and Amending Directive 2000/31/EC (Digital Services Act), 2022 O.J. (L 277) 1.

12.

Dari-Mattiacci

Marotta-Wurgler

(2022) Learning in standard-form contracts: Theory and evidence. Journal of Legal Analysis 14(1): 244–314.

13.

Davis

Marotta-Wurgler

(2019) Contracting for personal data. New York University Law Review 94: 662–705.

14.

Determann

(2018) No one owns data. Hastings Law Journal 70: 1–44.

15.

Dryer

Stockton

(2013) Internet “data scraping”: A primer for counseling clients. New York Law Journal 7: 1–3. Available at: https://www.law.com/newyorklawjournal/almID/1202610687621/ (accessed October 27, 2024).

16.

Edelman

Luca

Svirsky

(2017) Racial discrimination in the sharing economy: Evidence from a field experiment. American Economic Journal: Applied Economics 9: 1–22.

17.

Elkin-Koren

Di Gregorio

Perel

(2022) Social media as contractual networks: A bottom-up check on content moderation. Iowa Law Review 107: 987–1049.

18.

Elkin-Koren

Perel

Somech

(2025) Unlocking platform data for research. Indiana Law Journal 100: 1479–1526.

19.

European Parliament & Council of the European Union. (2016). Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data (General Data Protection Regulation). Official Journal of the European Union, L119, 1–88.

20.

Fahey

(2021) Data federalism. Harvard Law Review 135: 1007–1081.

21.

Federal Trade Commission (2012) Recommendations for protecting users’ privacy in an era of rapid change. Report, United States, March.

22.

Fiesler

Beard

Keegan

(2020) No robots, spiders, or scrapers: Legal and ethical regulation of data collection methods in social media terms of service. Proceedings of the International AAAI Conference on Web and Social Media 14: 187–196.

23.

Furth-Matzkin

Sommers

(2020) Consumer psychology and the problem of fine-print fraud. Stanford Law Review 72: 503–560.

24.

Geseley

(2025) Germany: Court allows Meta to use data from public social media profiles to train AI. Available at: https://www.loc.gov/item/global-legal-monitor/2025-07-07/germany-court-allows-meta-to-use-data-from-public-social-media-profiles-to-train-ai/ (accessed July 15, 2025).

25.

Gitelman

(2013) “Raw Data” Is an Oxymoron. London: MIT Press.

26.

Horwitz

(2020) Facebook seeks shutdown of NYU research project into political ad targeting. Available at: https://www.wsj.com/articles/facebook-cuts-off-access-for-nyu-research-into-political-ad-targeting-11628052204 (accessed October 27, 2024).

27.

Huq

(2021) The public trust in data. Georgetown Law Journal 110: 333–402.

28.

Hwang

Jennejohn

(2022) Contractual depth. Minnesota Law Review 106: 1267–1322.

29.

Israeli

Tsur

(2022) Free speech or free hate speech? Analyzing the proliferation of hate speech in Parler. In: Proceedings of the Fourth Workshop on Online Abuse and Harms (WOAH): 109–121. Stroudsburg, PA, USA: Association for Computational Linguistics

30.

Kadri

(2022) Platforms as blackacres. UCLA Law Review 68: 1184–1250.

31.

Kadri

(2021) Digital gatekeepers. Texas Law Review 99: 951–1003.

32.

Karanicolas

(2021) Too long; didn’t read: Finding meaning in platforms’ terms of service agreements. University of Toledo Law Review 52(1): 1–25.

33.

Kitchin

(2014) Big data, new epistemologies and paradigm shifts. Big Data & Society 1(1): 1–12.

34.

Knapp

(2012) Gentlemint is Pinterest for manly things. Available at: https://www.forbes.com/sites/alexknapp/2012/02/21/gentlemint-is-pinterest-for-manly-things/ (accessed October 27, 2024).

35.

Laouénan

Rathelot

(2022) Can information reduce ethnic discrimination? Evidence from Airbnb. American Economic Journal: Applied Economics 14: 107–132.

36.

Lessing

(1999) Code and Other Laws of Cyberspace. New York: Basic Publishing.

37.

Mancosu

Vegetti

(2020) What you can scrape and what is right to scrape: A proposal for a tool to collect public Facebook data. Social Media + Society 6(3): 1–11.

38.

Marotta-Wurgler

Taylor

(2013) Set in stone? Change and innovation in consumer standard-form contracts. New York University Law Review 88: 240–285.

39.

Marres

(2020) For a situational analytics: An interpretative methodology for the study of situations in computational settings. Big Data & Society 7(2): 1–16.

40.

Mnookin

Kornhauser

(1979) Bargaining in the shadow of the law: The case of divorce. Yale L J 88, 950.

41.

Nebeker

Dunseath

Linares-Orozco

(2020) A retrospective analysis of NIH-funded digital health research using social media platforms. Digital Health 6: 1–12.

42.

Pałka

(2025) Terms of service of social media platforms. In: Research Handbook on Social Media and the Law. Cheltenham: Edward Elgar Publishing, 166–183.

43.

Patro

Porcaro

Mitchell

, et al. (2022) Fair ranking: A critical review, challenges, and future directions. In: Proceedings of the 2022 ACM Conf. on Fairness, Accountability, and Transparency. New York, NY, United States: Association for Computing Machinery, 1929–1942.

44.

Perriam

Birkbak

Freeman

(2019) Digital methods in a post-API environment. International Journal of Social Research Methodology 23: 277–290.

45.

Perriam

Birkbak

Freeman

(2020) Digital methods in a post-API environment. International Journal of Social Research Methodology 23(3): 277–290. https://doi.org/10.1080/13645579.2019.1682840

46.

Perrino

(2023) Platform Accountability and Transparency Act reintroduced. In: Tech Policy Press. Available at: https://www.techpolicy.press/platform-accountability-and-transparency-act-reintroduced-in-senate/ (accessed November 21, 2024).

47.

Platform Accountability and Transparency Act (PATA), S (1876) 118th Cong. (2023).

48.

Regulation (EU) 2022/2065 of the European Parliament and of the Council of 19 October 2022 on a Single Market For Digital Services and amending Directive 2000/31/EC (Digital Services Act), O.J. L 277, 27.10.2022, at 1–102

49.

Rogers

(2021) Visual media analysis for Instagram and other online platforms. Big Data & Society 8(1): 1–23.

50.

Rosenberg

, et al. (2018) How Trump consultants exploited the Facebook data of millions. The New York Times. Available at: https://www.nytimes.com/2018/03/17/us/politics/cambridge-analytica-trump-campaign.html (accessed November 21, 2024).

51.

Rub

(2017) Copyright survives: Rethinking the copyright-contract conflict. Virginia Law Review 103: 1141–1248.

52.

Rubinfeld

Gal

(2017) Access barriers to big data. Arizona Law Review 59(2): 339–382.

53.

Sales

(1953) Standard form contracts. Modern Law Review 16: 318–342.

54.

Schwalbach

Mauer

(2025) Sharing digital trace data: Researchers’ challenges and needs. Big Data & Society 12(1): 5.

55.

Solove

Hartzog

(2025) The great scrape: The clash between scraping and privacy. California Law Review 113: 1521–1584.

56.

Suzor

(2018) Digital constitutionalism: Using the rule of law to evaluate the legitimacy of governance by platforms. Social Media + Society 4(3).

57.

Taylor

(2011) Consumer-driven changes to online form contracts. New York University Annual Survey of American Law 67: 371–431.

58.

Van Loo

(2022) Privacy pretexts. Cornell Law Review 108: 101–169.

59.

Viljoen

(2021) A relational theory of data governance. Yale Law Journal 131: 573–654.

60.

Vincent

(2021) Facebook bans academics who researched ad transparency and misinformation on Facebook. In: THE VERGE. Available at: https://www.theverge.com/2021/8/4/22609020/facebook-bans-academic-researchers-ad-transparency-misinformation-nyu-ad-observatory-plug-in.

61.

Westin

(1967) Privacy and Freedom. New York: Ig Publishing.

62.

Wilkinson-Ryan

(2014) A psychological account of consent to fine print. Iowa Law Review 99: 1745–1784.

63.

Wiśniewska

Pałka

(2023) The impact of the digital content directive on online platforms’ terms of service. Yearbook of European Law 42: 388–406.

64.

Woods

(2016) Against data exceptionalism. Stanford Law Review 68: 729–789.

65.

X Corp. v. Bright Data Ltd., 733 F . Supp. 3d 832 (N.D. Cal. 2024).

66.

Zarsky

(2014) Understanding discrimination in the scored society. Washington Law Review 89: 1375–1412.

Data enclosure by digital platforms: Empirical evidence

Abstract

Keywords

Introduction

Background

Platform data scraping and its academic significance

Scraping publicly available platform data: The legal landscape

Prior research and research hypothesis and significance

Data and methodology

Results

Access to data for scientific purposes: Main findings

Changes over time: The evolution of terms controlling access to platform data

Discussion

Contractual response to regulatory changes

Platform data enclosure via contracts

Conclusion

Footnotes

ORCID iDs

Ethical considerations

Consent to participate

Consent for publication

Author contribution

Funding

Declaration of conflicting interests

Data availability

Notes

References