Abstract
For 17 years, Twitter/X provided free academic access—not by law, but by social contract—to its data for researchers examining elections, hate speech, and other crucial public issues. It took just six months under Elon Musk to rescind this unfettered access and upend global scholarship.
On June 23, 2023, at the Information Sciences Institute at the University of Southern California, Dr. Kristina Lerman received an email from a graduate student. Data collection from Twitter had abruptly stopped. “We lost our data collection privileges,” the student said. Their servers were displaying a status message that read: “0% tweets pulled of 1,500. Monthly cap usage.” As of the day before, their servers were collecting 10 million tweets a month, per researcher.
“Looks like the end is here,” the student concluded.
Since February 2023, rumors had been circulating that Twitter was going to cut off free access to its data, after nearly two decades of data transparency and ongoing support of academic research. The cutoff would mean that researchers could no longer examine the flow of information moving across Twitter’s network. Researchers had been using free academic access to investigate the speed, distance, virality, and media effects of conversations taking place on Twitter for years. The end of Twitter’s policy of data transparency would go on to disrupt research projects around the world that were focused on critical issues, such as misinformation and manipulation campaigns, the pandemic, social movements, hate speech, threats, and violence. For example, University of Copenhagen professor Yevgeniy Golovchenko was using free academic access to monitor Russian misinformation campaigns in Ukraine, but his work was going to end with the Twitter cutoff.
As concerns about the end of data transparency at Twitter grew in the early 2023, more than 100 civil society groups signed an open letter written and posted by the nonprofit organization known as the Coalition for Independent Technology Research, questioning Twitter’s plan to impose fees to access data for research. Of course, data transparency is voluntary, at least in the United States, because under current law, the U.S. government does not require media companies to provide data access, although at least one data transparency law, known as S.5339, was proposed in the U.S. Senate in 2022. In the United States, it is only when the social media companies practice data transparency voluntarily that independent researchers can examine the content, sources, and effects of online speech.
The argument is that, while there is no legal obligation to provide data transparency, doing so supports the democratic functions of the public sphere.
While there is no legal obligation in the United States for platforms to provide data transparency, doing so supports the democratic functions of the public sphere.
Open Doors At the Old Twitter
Six months after Twitter launched in 2006, the company opened its data to developers via an application programming interface or API, which is a connecting apparatus between the developers submitting queries and the data on Twitter’s servers. Using an API allows third parties to query and collect data using endpoints without affecting the data.
Early versions of the Twitter API were intended for software developers. It was not easy for researchers to collect data as needed. Nevertheless, a community grew around studying Twitter, a field sometimes called “Twitterology,” because the other dominant platforms were not as forthcoming. Twitter later started to cater to researchers with a dedicated webpage and updates to its policies. For example, in 2020, Twitter released a COVID-19 endpoint for researchers examining the spread of misinformation about the pandemic.
In October 2020, Twitter opened a new academic track in beta and the API was fully opened for public research in January 2021. The new version of the free, academic API allowed for more precise queries. Researchers could add up to 1,000 rules when using the filter stream, compared to 25 query rules in the standard API. Moreover, query strings using the academic API allowed up to 1,024 characters, compared to 512 characters in the standard API. To use the academic API, researchers had to verify their university affiliations and state their objectives. Approval usually arrived within weeks. With access keys in hand, researchers could use the academic API to design and maintain data collections, including storing historical Twitter data on their own servers.
As of mid-2022, the trend at Twitter was still moving in the direction of increasing support for research. In September 2022, the company issued a press release about forming a research consortium. The first line said: “Twitter is committed to data-driven transparency.” Dr. Lerman was a member of the company’s academic research advisory board first formed in 2021, but after Elon Musk’s acquisition in October 2022, the research advisory board was ignored and it eventually disbanded. Twitter’s Trust and Safety Council was also dissolved after the Musk takeover. Its members resigned in protest and issued a statement about their concerns.
Because Musk claimed to be a free speech absolutist, observers had expected he would change moderation policies. Few anticipated that he would end the long-term policy of open data for research, including the free academic API. But on February 1, 2023, the Twitter Development team, known by its handle @TwitterDev, tweeted that free access to data via the API was going to end and paid-for APIs would be offered instead. Users responded by debating whether universities should have to pay for data.
At one time, the ability to access Twitter data and track global conversations was invaluable to researchers looking to understand public events.
iStockPhoto.com // liuzishan
Public Relations Transparency?
At USC, Dr. Lerman heard rumors from students that the Twitter academic API was going to die on February 13, 2023.
“We find out through rumors,” she commented. “We were saying ‘Oh no, quick, quick, quick.’ All the students were sharing and trying to max out their collections of data needed for projects. Then February 13 came, and nothing happened.”
On March 27, 2023, Twitter announced its new API tiers. There were three levels or tiers, and a fourth tier was added later. (The academic API had not yet died, but its demise was expected.) There was nothing specific offered for academics in the March announcement. When looking at the highest level, known as the Enterprise Tier, academics were shocked to discover that it would cost $42,000 a month to access to large-scale datasets. “We cannot afford to pay those kinds of fees,” Dr. Lerman said. “Some of our datasets have two billion tweets. That would be hundreds of thousands of dollars.”
Also on March 27, 2023, @TwitterDev tweeted a vague promise of things to come for researchers: “For Academia, we are looking at new ways to continue serving this community. In the meantime, Free, Basic and Enterprise tiers are available for academics.”
The lower access tiers—one free, the other costing $100/month—were not useful for academic research. The free tier was limited to 1,500 tweets a month. Neither of the lower tiers offered historical data.
Dr. Michael Dow is a linguist at the University of Montreal. One of his projects examines public opinion about the use of personal pronouns in the LGBTQ+ community. His research team faced a series of obstacles with the looming shutdown and loss of data transparency. In response to the vague tweet about promises of things to come, Dr. Dow tweeted: “So you’re leaving us in the dark without access and without a plan, all the while we have to continue research and writing grants? Seriously, is that what you’re doing?”
In a later interview, Dr. Dow emphasized the need for planning. “We have to budget for this research and hire researchers,” he said.
Back at USC and long after the day data transparency died at Twitter, Dr. Lerman still thinks about a research initiative about social media and mental health that ended with the Twitter cutoff. The project examined how people searching online for health information can be lured into fringe communities promoting self-harm behaviors such as cutting. When the Twitter API endpoints abruptly closed, Dr. Lerman had just completed a pilot study, which located several communities influencing young, online users to engage in self-harm.
“I would have liked to delve more deeply into it,” Dr. Lerman said.
Data and Democracy
Because the First Amendment limits government regulation of the media, data transparency is primarily an issue of corporate responsibility in the United States. The idea that the dominant social media platforms have obligations to the public is generally accepted. Certainly press freedom has long been tied to press responsibility, in theory and in practice. Nearly all of the dominant platforms have terms of service (ToS) about data transparency, harmful content, and related topics. The ToS policies are not enforced by law, at least in the United States, but do express a social contract with the public.
Even if the existence of a social contract between social media companies and the public is debatable, one point is certain: the decision by Twitter and other dominant platforms to limit data access for independent research means that social-media users are going to have their personal data collected and monetized, all the while having little valid information about—or control over—how their data is being used. The issue of data transparency is not going to go away for Twitter and other dominant platforms anytime soon. The European Union recently enacted digital regulations that require the dominant platforms to file transparency reports starting in 2024.
Maziyar Panahi, a researcher at the Institut des Systemes Complex de Paris, had been using the free academic API at Twitter for over a decade. Like Dr. Lerman at USC, he heard rumors about a cutoff, but only discovered that Twitter data access had, in fact, ended when he saw notifications on his servers that they had stopped collecting data.
“The shutdown had a profound impact on our work as researchers,” Panahi said. The institute where Panahi works in Paris has both national and European projects that rely heavily on Twitter data. It is participating in the project known as “The Narratives Observatory Combatting Disinformation in Europe Systemically” or NODES.
“A key component of our role is to collect multilingual data from Twitter, particularly focusing on topics related to COVID and climate change,” Panahi said. “This move by Twitter made it impossible to deliver our parts for this project.”
The ability to access Twitter data was, at one time, invaluable to researchers looking to understand public events. Disinformation campaigns regarding elections and pandemics are, after all, part of global conversations taking place on social media. Twitter, now known as X, has watched its dominance wane, as users and advertisers respond to changes in company policies and experiment with new, similar platforms, including Threads, Mastodon, and Bluesky. Arguably the recent shifts might render Twitter/X data less valuable for research about public opinion and media effects. The value of Twitter/X data has become an unknown, because it is now locked behind corporate walls. Dr. Lerman suggested that ongoing data transparency would benefit the company and the public alike.
Twitter/X’s new API tiers, as of March 2024. An academic tier is notably missing.
developer.twitter.com screenshot
“To save humanity, they should be transparent,” she said. “But it would be in their long-term interest as well.”
Elon Musk’s own statements imply a connection between transparency and trust. Shortly after he bought Twitter, Musk released “The Twitter Files,” a set of internal documents about content moderation. On November 28, 2022, Musk tweeted: “The Twitter Files on free speech suppression soon to be published on Twitter itself. The public deserves to know what really happened.” Later that same day, he wrote ominously: “This is a battle for the future of civilization. If free speech is lost even in America, tyranny is all that lies ahead.”
Having free and robust debate in the public sphere about the issues of the day has been an important rationale for free speech and press since the 17th century. Without the marketplace of ideas, which requires freedom of speech and the press, it is difficult for thought leaders to assess a range of competing facts and opinions when debating questions of social policy. Dr. Lerman believes data transparency is a social obligation because and for example, unless researchers can establish where online threats are coming from, there is no way to mitigate the risks.
Dr. Lerman believes data transparency is a social obligation because unless researchers can establish where online threats are coming from, there is no way to mitigate the risks.
“Twitter wanted to increase transparency, to increase trust,” Dr. Lerman said. “Giving academics access to Twitter data through the academic research API was the answer to that.”
According to democratic theory, the quality of the public conversation relies on a social contract between the press and the public, as well as press freedom from regulation. This social contract includes corporate commitments to transparency and accountability. While Elon Musk and others at Twitter/X claim to be in favor of free speech and against government regulation of the media, it is questionable whether they recognize the corresponding link between press freedom and press responsibility, and the link between data transparency and public trust.
Dr. Kai-Cheng Yang says of social media companies, “…[I]f they make their data available to the outside, countless researchers could and would study different types of issues on the platforms from different angles.”
iStockPhoto.com // Natee Meepian
Although Musk said he was going to make Twitter into more of a free speech platform, advertising losses might have changed his tune. When the company started to promote its policy of “free speech, but not reach”—meaning that some allegedly harmful, online content would be de-amplified or not monetized—news reports about increasing levels of hate speech on the platform led to a drop in ad revenues during the first year of Musk’s management. The Center for Countering Digital Hate (CCDH) is the nonprofit research organization behind the news reports about the increased hate speech on Twitter after Musk’s acquisition. In July 2023, Twitter, using its new name X Corp., filed a lawsuit against CCDH. The lawsuit alleged that CCDH was funded by Twitter competitors and foreign agents.
For its part, CCDH maintained the lawsuit was an attempt by Twitter/X to silence the free speech of public-interest researchers. In March 2024, the federal court in San Francisco dismissed all of the claims Twitter/X brought against CCDH, on grounds of free speech, saying the lawsuit was an attempt to punish the nonprofit for its speech.
Social Impact and Media Spectacle
Ever since Elon Musk bought Twitter in 2022, media outlets have aimed the spotlight on the spectacle of his impulsive style of leadership. First, the company changed its name to X with a late-night tweet, not a corporate press release. The re-branding was haphazard. For a while, the bluebirds were gone, but posts were still called tweets. Then there was the news that Musk put a giant glowing X on the top of the company building in San Francisco without a city permit, upsetting local residents. Journalists also extensively covered complaints about Musk’s decision to convert the blue-check system, once meant to verify high-profile accounts, into a pay-to-play system. And then there were the memes, posts, and stories about Musk challenging Mark Zuckerberg to a cage fight.
A media spectacle can be used to distract attention from social issues and inequality. With the Musk takeover spectacle, it might have been easy to miss the story of how Twitter/X shut down its academic API. Yet the end of data transparency at Twitter/X remains an important issue moving forward.
In the summer of 2023, Twitter/X CEO Linda Yaccarino claimed, in a company memo, that Twitter was “on a mission” to become the most accurate information source in the global town square. Whether talking about the public sphere or the town square, the problem remains the same. User-generated content flowing over Twitter/X’s network is no longer subject to independent research in the public interest. According to the Coalition for Independent Technology Research and the other civil society actors that signed on to its open letter, the public has a right to conduct research into the social impacts of social media.
Dr. Kai-Cheng Yang is a researcher who works at the Observatory on Social Media at Indiana University. He recently commented on the public relations efforts of the dominant platforms, for instance when they occasionally release reports about reducing harmful content on their networks. While public relations is important, Dr. Yang said, providing access to network data is the key.
“A lack of third-party involvement makes it impossible for outsiders to gauge the quality of these reports,” Dr. Yang said. “But if they make their data available to the outside, countless researchers could and would study different types of issues on the platforms from different angles.”
When the academic API was closed by Twitter, Dr. Yang did not bother to reach out to the company for answers. Twitter had long since shut down its communications department. He did talk to a few sales reps, saying researchers could not afford the Enterprise Tier at $42,000 a month.
Dr. Yang did not bother to reach out to the company for answers. Twitter had long since shut down its communications department.
“Although the sales reps were sympathetic, there was not much they could do about it,” Dr. Yang recalled.
In The Long Run
The national and international laws regarding data transparency may change in the long run. Company policies and practices may change as well. There have been dramatic changes in European law, and changes in U.S. law appear to be on the horizon.
Yet for now, academic researchers are continuing their work using other data. Dr. Thu Nguyen is the director of the Big Data for Health Equity (BD4HE) Research Collaborative at University of Maryland-College Park, where she studies how to improve health equity. Dr. Nguyen’s work requires large datasets. One of her team’s recent projects linked racist discourse on Twitter/X to social inequality: it required access to millions of tweets via the free, academic API. In another study, Dr. Nguyen’s researchers analyzed a random sample of over 56 million tweets across a 10-year period. They found that negative views expressed on Twitter toward racially minoritized groups are significantly associated with higher risks of adverse birth outcomes. Dr. Nguyen now needs to locate other data sources, because her projects about Twitter discourse ended the day data transparency died.
“How is hate speech changing on Twitter? That’s an important public health question,” Dr. Nguyen said. But for now, it’s one that can’t be answered.
Timeline: Twitter/X and Data Transparency
Twitter’s data made available for free by researchers
Twitter offers new tools for social researchers
Twitter creates Moderation Research Council
Elon Musk buys Twitter for $44 billion
Twitter stops enforcing COVID misinformation policy
Twitter Files released because “public deserves to know”
Hate speech is increasing on Twitter, research report finds
Twitter announces the end to free data access
Coalition issues open letter in support of public research
Second coalition letter signed by 100 civil society organizations
Twitter launches monetized access tiers with no research tier
New Twitter CEO claims platform is “global town square”
Ad revenue falls as advertisers worry about harmful content
Thousands of research projects end as access keys deactivated
Twitter starts changing its brand to X
X files a lawsuit against a nonprofit researching hate speech
