In June 2024, a group of researchers gathered in Brisbane, Australia, at the second Wikihistories conference to discuss the state of Wikimedia research. While most known for its flagship project, Wikipedia, the Wikimedia universe includes projects for images and video (Wikimedia Commons), books (Wikibooks), dictionary definitions (Wiktionary), and structured data (Wikidata), among others. Over the past fifteen years, Wikipedia has increasingly become infrastructural as its data is woven into search and AI services. This shift in conditions led us to draft a manifesto clarifying and unifying critical Wikimedia research (Ford et al., 2025). Through its ten action points, it draws together critical scholars from disparate fields and urges them to fashion the questions, concepts, and methods that are not only attuned to the datafied controversies and social concerns that are emerging now, but also to prepare the ground for the next generation of Wikimedia scholars and researchers. With these goals in mind, the manifesto offers a reoriented trajectory for Wikimedia research in the age of data and AI, one that prepares researchers to work together “to interrogate and reconstitute the knowledge commons.”
The first action of the manifesto is to map the dispossession of the commons. Notably, one of the most radical elements of Wikipedia was its capacity not just to collect the world's knowledge, but to “liberate” it from restrictive copyright regimes (Pentzold, 2021, p. 817). Through summarising, paraphrasing, and quotation, Wikipedians effectively repossessed knowledge and then offered this content for free (under a Creative Commons license). However, Google and other corporations have recaptured this gift with their corporate products and knowledge graphs that extract Wikimedia data without attribution or the ability to improve upstream uses. Even more alarming is the extractive logic of AI that has been used to mine Wikipedia and Wikidata for semantic data, stripping them of their connections to Wikimedian labour (McDowell, 2024). In this situation, researchers have asked: What happens to the agency of internet users when facts are extracted from Wikipedia's editable environment and positioned as obscure data points in a knowledge panel (Ford and Graham, 2016)? What is the effect of the ‘paradox of reuse’ of Wikimedia content? (McMahon et al., 2017)? What are the consequences of global platform-controlled knowledge graphs like the one operated by Google (Iliadis, 2022)? And if Wikimedia users are increasingly “realienated” from their labor, what future does this spell for Wikimedian projects like Wikipedia? (McDowell and Vetter, 2023). We must map these dispossessions to create a path – once again – to expanding and maintaining the “knowledge commons” (Hess and Ostrom, 2007).
These concerns allude to the most consequential shift toward recognizing Wikimedia's role as a hub of global knowledge infrastructure. We must extend our inquiry concerning Wikipedia to how people interpret shared social worlds. In other words, how does Wikimedia shape or determine reality (Ford, 2022; McDowell and Vetter, 2022)? From the first decade of research, these questions were mostly limited to studying and explaining how Wikipedia works. Now, it is necessary to define and research the emerging empiricism rooted in internet infrastructures that generate, store, and circulate data. This requires exploring how information enters Wikimedia projects, becomes assembled within Wikipedia, and is transformed into “datafied facts” by Wikimedia itself (Ford, 2022) and “fast facts” by corporations like Google, Amazon, Meta, Microsoft, and Apple (Iliadis and Ford, 2023). Accounting for these “chains of circulation” provides new ways to practically answer the philosophical questions of “What are facts? And how are they determined?” (Ford, 2022, p. 6). Research into this infrastructural construction of knowledge has already begun. To understand scientific chains of circulation, researchers have asked which sources are valued by Wikipedia (Lawrence and van Wanrooy, 2024, p. 3); whether Wikipedian articles “upend downstream models of science communication in which scientific facts are delivered to journalists and then to passive publics” (Moats, 2018, p. 9); and “how is information on and attention towards current events integrated into the existing topical structures of Wikipedia” (Gildersleve et al., 2023)? However, the pathways of knowledge are not limited to scientific and journalistic endpoints, but include search engines (Iliadis, 2022; Vincent and Hecht, 2021) and cultural institutions.
The manifesto calls researchers to examine power relations, especially when Wikimedia projects are datafied. It also includes questions about who has the social and cultural power to make and interpret datafied facts “made in the public sphere” (Menking and Rosenberg, 2020, p. 3). Likewise, it is crucial to question “[w]hat are the underlying logics about what knowledge is and who is best able to represent it on Wikipedia?” (Ford and Wajcman, 2017, p. 7). These broad questions related to democratic ideals make up one of the most significant considerations of critical Wikimedia research: how facts about marginalized publics are represented and which asymmetrical power relations emerge when marginalized publics participate in Wikimedia projects.
This concern has long been highlighted in research on Wikipedia's “persistent participation gaps” (Shaw and Hargittai, 2018, p. 148). Wikipedia's highlighted gaps have catalyzed activism to fill them, with questions arising about Wikipedia as a place for activist communities like Art + Feminism “within the larger Wikipedia community” (Tamani et al. 2019) or how can “a queer feminist approach to media praxis [open] up possibilities for a targeted digital activism” (Vetter and Pettiway, 2017)? Ultimately, these questions require researchers to trace and ameliorate the injustices between counterpublics and a defensive public (Jankowski, 2024) that produce struggles concerning gender (Eckert and Steiner, 2013; Wagner et al., 2015); ethnicity and race (Adams et al., 2019; Lemieux et al., 2023; Mandiberg, 2023); as well as colonialism and Indigenous knowledge (Gallert and Van der Velden, 2013; Luyt, 2011). More needs to be done to understand how these conflicts shape Wikipedia's knowledges as inequalities in representation are intensified and/or rearranged with further datafication and to expand to issues of class, ability, and nationality.
As this research suggests, examining power relations must involve exploring the juxtapositions of Wikimedia policies and practices. This can mean staying close to reviewing the ideals of Wikimedia, which includes asking, “How should we choose among conflicting authorities?” or, how do “enthusiasts realise the utopian vision associated with peer production?” (Leitch, 2014, p. 12; Pentzold, 2021, p. 817). However, it also means looking at how Wikipedia's rules have been used “to avoid addressing the inequitable treatment of content matter” (Gauthier & Sawchuk, p. 385). This concern has been explicitly directed to surveying how the “application of Wikipedia's notability guidelines play a critical role in the perpetuation of gender inequality” (Tripodi, 2023, p. 2). Ultimately, the reification of emergent practices through Wikimedia policies has led to other questions such as “[w]hat kinds of women are favoured by Wikipedia in the creation of new articles” and which ones are not (Ford et al., 2023, p. 3)? While Wikimedia has been founded on an inclusive and participatory dream of knowledge production, its chosen policies and practices do not naturally lead to these outcomes.
These practices of policy-reinforced exclusion can also play out in ways that demonstrate the need to investigate linguistic and cultural plurality. Researchers have asked what are the “distinctive editing cultures […] that identify and highlight incompatibilities” between different language editions of the same articles, as well as “[h]ow do patterns of rule-making over time compare across autonomous communities” within Wikipedia (Rogers, 2019; Hwang and Shaw, 2022, p. 348). Notably, such differences in rule-making are not always neutral differences, as some differences in rules can impact content “about marginalised communities” (Berson et al., 2021) or establish governance arrangements that make some language editions “more vulnerable to disinformation campaigns” (Kharazian et al., 2024, p. 1). Furthermore, Wikimedia's epistemic exclusivity on Western forms of evidence (such as visual knowledge communicated through text and image) is at odds with other knowledge practices. As such, researchers have asked how local and historical oral sources could be used in specific language editions as legitimate sources (Avieson, 2022). Such comparisons decenter the privilege English Wikipedia has received by researchers and open opportunities to examine how smaller communities within Wikimedia can innovate in ways that expand the knowledge commons.
While Wikipedia is organized through practices and policies, it also functions and is governed by its technical structure. Therefore, an important area of inquiry is assessing algorithms’ implications. In this frame, Geiger (2017) led the way in analyzing how automation shapes Wikipedia by asking, “What does it mean to contribute [to Wikipedia] when that participation requires […] learning how to interact with all the bots and power tools that veteran Wikipedians rely on” (p. 3)? This might also mean asking questions about each project's interface, since its participatory affordances enable practices and policies. For example, we might ask how democratic ideals, such as consensus, become practical and meaningful “through the socio-technical structure of a digital platform” (Jankowski, 2022, p. 1). In other words, how are ideas and practices woven into the computational design of Wikimedia, and conversely, how do these mechanisms change the meaning of what counts as knowledge production and participation once they are encoded into the software?
It is important to note that most studies explain the relevancy of Wikimedia projects in terms of how they represent “new” forms of knowledge production. However, the ideas, ambitions, and practices surrounding Wikimedia do not represent a radical break from the past. They are a continuation of it, and it is therefore necessary to historicize Wikimedia's epistemology. This can mean starting from the question of how Wikimedia projects align with and diverge from “the pursuit of the universal encyclopedia” (Reagle, 2010, p. 17). This can also mean examining the place of Wikipedia in the long history of encyclopedic production (Loveland and Reagle, 2013, p. 1306); how “enlightenment rhetoric persists in and about Wikipedia?” (Vetter, 2020); or how its practices align with historical knowledge production (Apostolopoulos, 2024). But Wikimedia's knowledges are not only related to the history of encyclopedic texts. As a global knowledge infrastructure, the history of Wikimedia is intimately tied to the philosophical and cultural histories of science, technology, economics, libraries, archives, journalism, and education. To this point, we must ask: “What are the key principles that flow throughout the project, leaving their mark on everything else?” and how do these ideas connect to a more extended history of political or cultural thought (Tkacz, 2014, p. 5–6)? Historians can play a crucial role in making sense of these genealogies that converge within Wikimedia projects.
Concerning science, it is important to note how significant the value of following feminist theories of knowledge (Alcoff and Potter, 1993; Haraway, 1988) that convincingly argued that being objective means considering how knowledge is situated. This complicates the tendency among researchers who extract Wikimedian data as a “ground truth” for training algorithms and AI systems. For critical Wikimedia researchers, it is therefore necessary to avoid this epistemological blind spot by studying Wikimedia's data as partial, temporary, fallible and shifting.
Researchers who acknowledge missing data, the work they do to “clean” Wikimedia data, or reference the specific versions of Wikimedia pages in their quantitative research have started to uncover these features. In this vein, Hill and Shaw (2014) point out the implications of Wikipedia's “redirect” pages for creating datasets for analysis. Other areas of concern are how Wikidata editors have addressed gendered data which has included “several missteps and lack of care” while others have attempted to “remedy them through practices of data repair […] and changes in policies and technical infrastructures” (Melis et al., 2024, p. 200). Focusing on tools for researchers, Weltevrede and Borra (2016) demonstrate the sociotechnical work that can be done to “showcase a controversy's evolution on Wikipedia” and make the shifting nature of the site more visible (p. 5). Attached to this approach is the fact that data from Wikimedian projects emerge from fallible sociotechnical communities. And therefore, these communities must be studied on their own terms, as was demonstrated in the ground laying ethnographic work conducted by Reagle (2010) and Jemielniak (2014).
In addition to treating Wikimedia data as situated and partial, critical Wikimedia researchers might situate their own research practice. For some researchers, this means using the feminist research practice of providing positionality statements that underscore how the researcher is impacted by or is complicit in power relations. For scholars using and studying data practices, it means reflecting on the meaning of producing “good data” that is not just a matter of how to extract and clean data. It includes considering which kinds of social data should not be collected, recognizing the lifecycle of the collected data, and reflecting on the consequences of publishing and aggregating that data for communities connected to it (Daly et al., 2019). Some of these concerns have been addressed by researchers who have developed Wikipedia-specific guidelines for conducting ethical research in Wikimedia communities (Pentzold, 2017; Wikipedia, 2022; Zent, 2024). Beyond participant observation, feminist research practices often highlight participatory forms of research, such as the model established by the Art + Feminism community. But there are other forms to consider. Inspired by infrastructure studies, researchers could consider what it would mean to conduct infrastructure walks (ten Oever and Maxigas, 2023) of Wikimedia projects. Or researchers might borrow the method of policy design workshops to do local work on specific Wikimedia reforms or address broader questions about the knowledge commons (Shulz et al., 2024).
To conclude what we have covered in this commentary, critical scholars of Wikimedia need a manifesto for two key reasons. Critical/humanist Wikimedia researchers are diffused across science and technology studies, media studies, anthropology, computer science, communication, literature and history, among many other fields. While this variety is fruitful for creating diverse perspectives, it has also suffered from disciplinary disconnection. The Wikimedia Foundation itself has made steps towards addressing some of these issues with its annual Wiki Workshop and Wikimedia Research Showcase. However, these venues continue to reflect the condition that “most of the published research work related to Wikipedia is based in the sciences or engineering disciplines” (Hill and Bayer, 2023), which have typically ignored the kinds of critical approaches led by humanities and social science disciplines that we highlight here.
Therefore, we advocate for the need to build a shared project of critical investigation across disciplines that sets the agenda from a humanities perspective. To this point, one action can be supporting and growing critical researcher networks. This can occur through institutional funding by universities and the Wikimedia Foundation for interdisciplinary symposiums, as well as PhD and postdoctoral positions in humanities-related disciplines. Secondly, while ethnographies of the (English Wikipedia) community and content analyses of Wikipedia's data served researchers well in the first two decades of Wikipedia research, the phenomenon under analysis has shifted as Wikimedia has become a vital global knowledge infrastructure. The meanings encoded in Wikimedia's articles, entities, and items have expanded beyond the platform into the everyday lives of millions, performing as consensus truth in a myriad of new forms. To address the evolving challenges of the knowledge commons in the age of large-scale data extraction by generative AI, we must reassess existing methods, experiment with new approaches, and develop concepts that bridge different fields. With these calls to action, we are hopeful that we can thoughtfully respond to the changing conditions and necessities of doing research about Wikimedia as its place within global knowledge infrastructures becomes increasingly vital.