Decoding revision mechanisms in Wikipedia: Collaboration,moderation,and collectivities

Abstract

Research on knowledge collaboration in Wikipedia has predominately focused on metadata at the article level or editor-centric analyses, often overlooking the complexities of knowledge collaboration and its contextual dependencies. This study takes a novel, fine-grained approach to investigating revision mechanisms in Wikipedia’s knowledge collaboration. By considering modified sentences as carriers of collective knowledge and spaces in which epistemic power is negotiated, it reconstructs their revision sequences and examines how editorial, contextual, content, and temporal factors shape Wikipedia’s revision dynamics. A total of 140,593 revisions (by 48,643 editors) of 76,525 sentences in 537 Wikipedia articles related to climate change were analyzed using text mining, natural language processing, survival analysis, and meta-analysis. The findings expand our understanding of how epistemic power is negotiated through collective endeavors underlying bureaucratic rules and community moderation in Wikipedia.

Keywords

Climate change collective action computational methods epistemic power governance knowledge collaboration moderation natural language processing survival analysis Wikipedia

Introduction

In the post-truth era, online platforms struggle to cope with the proliferation of disinformation and misinformation. Despite deploying a mix of paid human labor, automated systems, and third-party or community-based fact-checking, social media giants such as YouTube, X (formerly Twitter), and Facebook continue to face criticism over the quality and reliability of the information they host. By contrast, Wikipedia—an encyclopedia everyone can edit—stands out as a bastion of trust (Bruckman, 2022; McDowell and Vetter, 2021). As “commons-based peer production,” Wikipedia is non-proprietarily created and coordinated, as well as contributed to, by hundreds of thousands of self-selected, widely distributed, and loosely connected individuals (Benkler, 2006). On the one hand, broad participation, loose connections, and diverse expertise and interests reflect the “many minds collaborating” (Niederer and Van Dijck, 2010; Ren and Yan, 2017). On the other hand, Wikipedia’s open nature presents challenges posed by vandalism, conflicts, and disagreements among editors, which can compromise accuracy and consistency and require effective governance. Questions are raised about how this collaborative model enables Wikipedia to become the most influential and reliable source of information.

Understanding how knowledge is collaboratively shaped and maintained in Wikipedia has been a long-standing inquiry in different disciplines (Ren et al., 2023). Empirical studies have either focused on illustrating how editors debate sources, reference guidelines, and negotiate consensus in contentious settings or have attempted to generalize patterns such as working routines or editor dynamics. However, qualitative studies are limited, as they do not allow for the inference of how different factors, such as contestations over the meaning of individual sentences or conflicts arising from violations of community-defined rules, impact knowledge collaboration. Meanwhile, quantitative studies cannot fully capture the complexities of knowledge collaboration as they only estimate statistical models based on metadata at the article level or merely from an editor-centric perspective without reflecting the underlying contextual environments in which each piece of knowledge is shaped and maintained.

To address these limitations, this study scrutinizes the mechanisms of how revisions shape and maintain collective knowledge, guided by the overarching research question (RQ1): What organizational patterns structure Wikipedia’s revision processes? Specifically, it analyzes the revision dynamics at the sentence level, guided by the operationalized research question (RQ2): What factors shape the revision mechanisms? This study introduces an explanatory model that identifies the mechanisms influencing knowledge persistence and modification across four dimensions: (1) editor characteristics, (2) context dependencies, (3) content relevance, and (4) temporal factors. It examines 140,593 revisions (by 48,643 editors) of 76,525 sentences in 537 English-language Wikipedia articles related to climate change as a case study. The contributions of this study are twofold: Methodologically, it offers a novel, fine-grained perspective on the actual revision processes by reconstructing the evolution of sentences, operationalizing key dimensions influencing knowledge collaboration, and estimating their impacts on knowledge collaboration. Empirically, the findings provide insights into the tensions raised by self-organizing in knowledge collaboration and corresponding responses through peer and bureaucratic control, revealing how power is negotiated through collective endeavors of mutual observation, iterative refinement, and community moderation.

Knowledge collaboration in Wikipedia

Knowledge collaboration in Wikipedia is enabled in different ways. Technologically, Wikipedia’s web application facilitates the coordinative processes. Each article has a main page to display the current version of the collective knowledge on a particular topic, a “talk” page for editors to discuss writing for agreeable content, and a “history” page for version control. Normatively, the platform rules and guidelines safeguard the quality and reliability of the collective knowledge. While the content policies—neutral point of view (NPOV), verifiability, and no original research—determine the nature of the information presented on Wikipedia, the conduct policies regulate editors’ behaviors.

The platform facilitates both indirect and organized coordination. Indirect coordination occurs on the article’s main page, where editors, even anonymous editors, can directly edit and present their edited version as the current version of most Wikipedia articles. This open model enables stigmergic coordination, a self-organizing mechanism in which individuals respond to and build upon existing work without direct communication or predefined planning (Heylighen, 2016). Editors focus on the current state of the article, working independently without negotiation, imposed sequences, or division of labor (Heylighen, 2016). By contrast, organized coordination occurs chiefly on the talk page, where most discussion posts on the relevant article contain requests or suggestions for coordination (Bipat et al., 2018; Morgan et al., 2013; Viegas et al., 2007). For both indirect and organized coordination, policies and guidelines are cited when editors evaluate, process, and validate the collective knowledge (McDowell and Vetter, 2021). They serve to enable self-governance and solve coordination and communication problems (Beschastnikh et al., 2021; Butler et al., 2008).

Collaborative processes involve an interplay between knowledge, actors, agents, rules, and norms over time. As disagreements, negotiations, and controversies inevitably arise, policies and guidelines grow, adapting to incorporate diverse perspectives, ranging from conflict resolution and decision-making to newcomer socialization (Osman, 2013). Automated tools, such as bots, have been introduced and implemented to enhance efficiency in dealing with reliability checks, vandalism, and edit wars on a large scale. To avoid systemic bias and facilitate accountability for specific topics, voluntary task forces have established projects such as WikiProject to assist in the coordination of articles on certain topics (Bruckman, 2022). These various collaborative initiatives and applications give rise to a constantly evolving landscape of knowledge collaboration in Wikipedia that seeks to reconcile competing interests and foster a cohesive understanding of collective knowledge (McDowell and Vetter, 2021).

Collectivities, organizing, and governance

Social science studies expected that Wikipedia, in the absence of bureaucratic organizations and relying on the voluntary community, would demonstrate the deliberative practices that generate a participatory, discourse-centric, and decentralized model of collaborative knowledge production (Benkler, 2006; Pfister, 2011). However, it is guided by a bureaucratic oversight mechanism that consists of community moderation and peer controls (Jemielniak, 2014; König, 2013; Niederer and Van Dijck, 2010).

Shaw and Hill (2014) identified the “iron law of oligarchy” in Wikipedia, where a monopoly of power is exercised and consolidated within a small group of early editors. Among the over 48 million registered editors, about 0.24% edit regularly, and only 0.01% edit actively (Wikipedia contributors, 2024). Based on their contributions and community consensus, editors have different access levels, such as unregistered editors, registered editors, administrators, and bureaucrats. The roles determine not only the editors’ activities but also the organizational structures (Arazy et al., 2015). For example, administrators can protect articles from editors without advanced permissions, while unregistered anonymous editors can only edit unprotected pages. Compared to unregistered editors, registered editors make the most reliable contributions (Anthony et al., 2009). The contributions of new editors are more likely to be reverted by experienced editors, negatively affecting the retention of desirable newcomers (Halfaker et al., 2011, 2013).

Moreover, Wikipedia’s governance extends beyond spontaneous collaboration to include bureaucratic rules and peer control (Jemielniak, 2014; Matias, 2019; Rijshouwer et al., 2023). Instead of relying solely on stigmergic spontaneity and good faith collaboration, the growing body of rules, guidelines, and automated tools enforced by editors ensures the quality of peer contributions and increases efficiency in performing tasks (Hwang and Shaw, 2022; Jemielniak, 2014). Peer control arises from the collective efforts of editors engaging in mutual scrutiny and editorial surveillance, enabled by technological affordances such as revision tracking, discourse monitoring, and IP address logging (Jemielniak, 2014; Pentzold, 2017). Rijshouwer et al. (2023) argue that Wikipedia’s shift from a charismatic, open, self-organizing community to a bottom-up bureaucracy is a necessary adaption to its growing size and complexity. Indeed, the legitimacy of integrating bureaucratization as an essential part of developing and maintaining a self-organized culture indeed depends on the community members rather than on the authority or the elites in the community (Jemielniak, 2014; Matias, 2019; Rijshouwer et al., 2023).

The discussion of the organizational structure in Wikipedia leads to the expectation that editor characteristics, including the editor’s role and previous engagement, shape revision processes. Regarding the editor’s role, edits made by editors with high access levels tend to remain more stable, whereas those revised by editors with low access levels are more vulnerable to revision. Editors with high access levels often play a more active role in the revision processes, overseeing and refining content. Regarding the editor’s engagement, reactive editors are inclined to create recurrent revisions, reinforcing both the iterative nature of Wikipedia’s knowledge construction and the community’s moderation efforts. Keegan et al. (2016) showed that Wikipedia relies heavily on single-contributor work sessions rather than multi-contributor teamwork. Thus, edits made by solo editors who expand their edits over multiple revision sessions are also expected to trigger revisions.

Power, moderation, and collaboration

Wikipedia’s revision processes reveal the complex power dynamics underlying moderation, contestation, and collaboration. Through the stigmergic affordances of the platform, individuals can reinforce interpretations, remove alternative perspectives, or contest revisions by unilaterally changing or reverting the presented content without informing or discussing it with others. Although this can cause conflicts, vandalism, and edit wars, Wikipedia’s governance constrains the exercise of power in such actions through community moderation processes and automated tools. The quality and robustness of articles, particularly those on contentious topics, remain high (McDowell and Vetter, 2021).

This gatekeeping process is circular, with the distinction in gatekeeping between the gated and gatekeepers being fluid. Despite the uneven distribution of privileges and responsibilities among editors, all editors are embedded in different accountability networks of community moderation where one can serve as a gatekeeper in one network while being regulated by others in another network (Nahon, 2011; Pentzold, 2017). Even the most experienced editors with the highest access level do not possess the ability to oversee or control the entire body of collective knowledge as projects scale; instead, they must engage in a continuous process of mutual observation and negotiate their work with the community and other editors (Jemielniak, 2014; Matias, 2019; Rijshouwer et al., 2023).

In the revision processes, power is distributed and managed by editors’ interactions with fellow editors, the community, and the wider system that relies on their labor (Matias, 2019). Beyond the roles delineated by access levels, several scholars (Arazy et al., 2016; Arazy et al., 2015; Faraj et al., 2011; Maki et al., 2017) have identified various functional roles, such as “shaper” and “defender,” created and taken by editors to incorporate the need for collaboration in a given moment. By taking on these roles, editors respond to the evolving needs of the community, such as resolving content disputes, enforcing policies, improving article quality, or addressing vandalism, based on their understanding of where their contributions will be most valuable (Faraj et al., 2011).

The literature supports the expectation that contextual dependencies serve as key drivers of revision mechanisms. On the one hand, moderation frequently drives revisions, particularly when editors engage in disputes grounded in content or conduct policies, or when they correct misleading information. Meanwhile, moderated edits can prompt further revision as editors refine or contest changes made through moderation. On the other hand, edits that require editorial engagement and community support for refinement often undergo additional revisions.

Decoding revision mechanisms

This paper examines knowledge collaboration on climate change as an exemplary case given the inherently diverse perspectives and viewpoints involved in the discourse. Previous studies on climate change in Wikipedia (Esteves Gonçalves da Costa and Cukierman, 2019; Weltevrede and Borra, 2016) have found that only a limited number of heavily edited topics within climate change attract the most reverts and instances of vandalism. This suggests that the revision mechanisms across all related articles are stable and not heavily influenced by vandalism, conflicts, or sudden critical issues. However, these studies, which examined only a small number of Wikipedia articles, are primarily descriptive and do not estimate how specific factors, such as questioning neutrality, shape knowledge collaboration.

Quantitative studies have examined various aspects of the general processes of knowledge collaboration in Wikipedia articles, such as community performance (Ren and Yan, 2017), participatory hierarchies (Champion and Hill, 2024; Halfaker et al., 2011, 2013; Shaw and Hill, 2014), and working routines (Arazy et al., 2016, 2020). However, most of these are solely actor-centric, relying on aggregated metadata such as the number of edits made by the involved editors and focusing exclusively on the article level, neglecting the complexities of knowledge collaboration and its contextual dependencies. Some studies have applied relational event models (Bürger et al., 2023; Lerner and Lomi, 2020) or sequence analysis (Arazy et al., 2020; Keegan et al., 2016) to investigate how different editor groups revise controversial articles. These network approaches construct revisions of an article into a sequence or map revision sequences onto a dynamic network among editors. However, the mechanisms driving revisions remain underexplored. It is difficult to discern whether the formation of network structures reflects individual decisions based on content-related factors or group dynamics within the community of editors.

This study fills this research gap by investigating Wikipedia’s sentence-level revision mechanisms. In particular, it views modified sentences as carriers of collective knowledge and spaces in which contestations and collaborations occur. From a heuristic perspective, editors tend to monitor articles passively and engage in editing only when they perceive a need to add, delete, or modify content. Other than articles, sentences embody the fundamental characteristics of Wikipedia (Benkler, 2006): modularity (consisting of small modules of tasks) and granularity (having the capacity to scale modules). I highlight the importance of accounting for both the content and context of knowledge collaborations, because the ideas, rather than the editors, are undergoing massive changes in knowledge collaboration (Faraj et al., 2011: 1230). Multiple ideas may evolve through a dynamic process of divergence and convergence, where distinct perspectives and contributions from various editors can lead to diverse development paths (Faraj et al., 2011). Therefore, I argue that analyzing sequential trajectories of revisions can provide a nuanced understanding of how each idea is shaped and maintained over time.

Building on the theoretical framework outlined in the previous sections, this study proposes an explanatory model that examines Wikipedia’s revision mechanisms at the sentence level. It identifies four dimensions that shape knowledge persistence and modification. In addition to (1) editor characteristics and (2) contextual dependencies as primary drivers of revision mechanisms that directly shape knowledge persistence and modification, this model investigates two additional dimensions: (3) temporal factors and (4) content relevance to account for variations in revision dynamics. It is expected that content affects the level of editorial attention a sentence receives and that highly relevant or contested content is more frequently revised; the more relevant the content, the more likely it is to undergo revision. Temporal factors reflect the stability of content, as sentences that have undergone multiple modifications are better established and thus less likely to be revised further. Together, these dimensions form the basis for analyzing Wikipedia’s revision mechanisms.

Data and methods

Data

Via the Wikipedia API, I collected all revision details and the editor information for 1,000 English-language articles targeted in the WikiProject Climate change.¹ The articles span different areas that are related to climate change or relevant to climate communications, such as “Greta Thunberg,” “Antarctica,” “cattle,” and “Sustainable Development Goals” from the day the article was created until the day of data collection (7 December 2021). I first identified the modified sentences from each revision and then calculated their pairwise semantic similarity. Based on semantic similarity and temporal distance, revision sequences of each sentence were created (Supplemental material).² Each sequence captures the evolution of a sentence, including its (1) editor characteristics, (2) context dependencies, (3) content relevance, and (4) temporal factors (Figure 1).

Figure 1.

Data structure.

Temporal variables

I utilized the converted Unix timestamp to process temporal variables. The time between revisions was calculated as the difference between the creation time of the revision and that of the subsequent revision. This analysis included the creation time of the first edit (start) at the sentence level, as well as the revision time (time), and the count of previous revisions (parity) for each revision.

Content

I extended Borra et al.’s (2014) approach to quantify the relevance of the individual terms and revised sentences, leveraging wikilinking and editor interactions related to the revised sentences as indicators. Wiki links³ are textual elements that connect to other Wikipedia articles, enhancing viewers’ comprehension by linking to relevant topics, technical terms, or unfamiliar proper names (Borra et al., 2014). The method extracts terms that are highlighted by the editors through wikilinking to indicate disputed content. A wikilink can serve as a reference point for broader meanings, revealing the kinds of content that are more likely to prompt revisions. Terms are weighted based on the frequency of revisions involving sentences containing them, with these weights serving as term relevance scores. The relevance score of a sentence is calculated by averaging the relevance scores of the terms it contains.

The relevance value $c_{n}^{p} (W_{i j})$ of each Wiki-linked term $W_{i}$ in a sentence $S_{j}$ revised by an editor $E_{p}$ in the revision $R_{n}$ is equal to the inverse sum of the total Wiki links (assuming $r$ revisions of $S_{j}$ have been made and $s$ sentences containing $W_{i}$ in one revision are present). To aggregate relevance values across multiple revisions, $c_{n}^{p} (W_{i})$ is obtained by summing the $c_{n}^{p} (W_{i j})$ values of the related sentences from the first to the $n$ -th revision by $E_{p}$ . The article-level term score $c (W_{i})$ is the sum of minimums between the pair: $c_{n}^{p} (W_{i})$ by $E_{p}$ and $c_{m}^{q} (W_{i})$ of the subsequent revision by $E_{q}$ , regulated by subtracting the maximum in the pair and multiplying the number of the involved editors. The global term score $\tilde{c} (W_{i})$ is obtained by summing the term scores $c (W_{i})$ across all revisions of the studied articles

c_{n}^{p} (W_{i}) = \sum_{i = 1}^{n} \sum_{j = 1}^{s} c_{n}^{p} (W_{i j}) = \sum_{i = 1}^{n} \sum_{j = 1}^{s} \sum_{S_{j} \in R_{n}} \frac{1}{w (S_{j})}

c (W_{i}) = | E | (\sum_{i = 1}^{r} \min (c_{n}^{p} (W_{i}), c_{m}^{q} (W_{i})) - \max_{n = 1}^{r} (\min (c_{n}^{p} (W_{i}), c_{m}^{q} (W_{i}))))

Applying this approach, the most disputed content is related to “climate change”⁴ (term score = 31,996.9) and its physical causes, such as “greenhouse gas” (79,046.9), related policies and debates such as the “intergovernmental panel on climate change” (24,836.1), and energy-related topics such as “fossil fuel” (16,753.2). Finally, the sentence score $\tilde{c} (S_{i})$ measures the content relevance of a sentence by averaging the sums of the global term scores $\tilde{c} (W_{i})$ of the terms it contains.

Context: moderation and collaboration

This study investigates (v1) moderation and (v2) collaboration in the edit summary⁵ of each revision. While (v1) moderation is about ensuring edit quality through scrutiny of content for accuracy, neutrality, and completeness, as well as enforcing conduct policies to address disruptive behavior, (v2) collaboration involves task communication, which is characterized by coordination, responses to tasks, or updates on a task’s status. Furthermore, moderation was categorized into three detailed subcategories: (v1.1) content policies contestation when the revision challenges or disputes the content policies regarding accuracy, neutrality, or verifiability of the information presented in the sentences; (v1.2) conduct policies when the revision regulates a previous edit that fails to adhere to Wikipedia conduct policies on, for example, edit warring and harassment; (v1.3) climate denial and skepticism, when a revision identifies misleading information from climate skeptics. Finally, (v1.0) contestation encompasses general moderation edits that do not fall into any of the above subcategories.

The data to be annotated were first sampled through a cluster-based sampling strategy (Supplemental material) and then iteratively selected by applying the concept of active learning. Two trained coders annotated 13,306 edit summaries (Krippendorff’s alpha: v1 = 0.92, v1.1 = 0.86, v1.2 = 0.84, v1.3 = 1, v2 = 0.75; sample size for the reliability test = 250). Since only 0.6% of edits were related to climate denial in the annotated samples of edit summaries, v1.3 was discarded for training.

I built a binary classifier for each category by fine-tuning the pre-trained language model DistilBERT (Sanh et al., 2019). Hyperparameter tuning was conducted on the training set for 50 trials. The final model with the optimal parameter set yielding the best performance was used (see Table 1). To identify whether the found revision type corresponds to a specific sentence or multiple sentences, I compared (1) the quoted segments in each edit summary with the edited sentences from the same revision and (2) the substantive words (nouns, pronouns, verbs, adjectives, adverbs) of each sentence of one revision with that of the corresponding edit summary.

Table 1.

Model performance.

	Average accuracy (validation set)	Final accuracy (test set)	Average F1 score (validation set)	Final F1 score (test set)
v1	0.88	0.87	0.87	0.86
v1.1	0.94	0.94	0.76	0.77
v1.2	0.98	0.99	0.89	0.92
v2	0.94	0.95	0.74	0.76

Editor’s role

Editors⁶ are classified according to five mutually exclusive categories: (1) special role: non-automated editors with extended rights that are not granted automatically, such as administrator; (2) bots: automated or semi-automated accounts used by approved bots; (3) unregistered: anonymous, unregistered users; (4) blocked: editors who were previously involved in editing but were later blocked; and (5) common: the rest of the editors.

Editor’s engagement

The editor’s prior engagement with the sentence indicates whether and how often they have engaged with it. Adopting the approach of Keegan et al. (2016), each revision of a sentence in the order i is sorted into five groups according to its editor’s prior engagement: (1) solo: the editor immediately revised the sentence after their previous revision at i-1; (2) reactive editing—ping-pong: the editor engaged in ping-pong-playing editing at i-2 with another editor at i-1; (3) reactive—recent: the editor revised the sentence within a time frame (at i-3 to i-5) before the current edit; (4) reactive—inactive: the editor has made a prior revision before i-5; and (5) newcomer: the editor has no record of previous edit.

Shared frailty models and meta-analysis

Survival analysis has been used in fields such as medicine to model the occurrence of diseases following treatment interventions and mechanical engineering to predict the risk of component failure due to changes. In this study, it was employed to estimate revision hazard—the probability that an edit with certain characteristics would occur. This identifies which edits with specific characteristics are more likely to “survive” revision. Compared to regression models that use continuous dependent variables, the regression models in survival analysis can handle the time-to-event data and apply statistical techniques of censoring to ensure accurate estimation for incomplete event history. Survival analysis examines how factors affect a particular event’s likelihood compared to the reference group’s at a given time, using the hazard ratio (HR) as an estimate. While an HR greater than 1 indicates that an individual or group has a higher hazard rate than the reference group, a HR lower than 1 suggests a lower hazard rate. For example, an HR of 1.2 implies that the hazard rate for an individual or group is 20% higher than that of the reference group, meaning the event is expected to happen sooner.

The data structure of this study is nested, with revisions grouped at the sentence level and sentences grouped at the article level. Therefore, the shared frailty model, a variant of the survival analysis that measures survival times’ dependence on recurrent events (Balan and Putter, 2020; Therneau, 2015), was applied. Conditional on the frailties of the sentences, the frailty model differentiates the revision hazards for revisions of each sentence. The dependent variable was the time and occurrence of revisions. The independent variables (Table 2) were the content relevance (content) and the start time of the first edit (start) for each sentence. For each revision, the variables were the revision time (time) and the count of previous revisions (parity). In addition, not only the characteristics of individual edits that were revised (edits_revised) but also those of their follow-up edits (edits_revising) impact the revision hazard. Thus, I incorporated editor’s prior activity (engagement), access level (role), and contextual information (moderation/collaboration) as key variables for both the edits_revised and edits_revising.

Table 2.

Variables overview.

	Time	Content	Editor	Context
Article	Time of the first revision	Article score	Number of unique editors	Number of revised sentences
Sentence	Time of the first revision	Sentence score	Number of unique editors	Number of revisions
Revision	Time of revision		Roles - Special role - Bots - Registered - Unregistered	Moderation - Contestation - Content policies related - Conduct policies related
Revision	Parity		Previous engagement - Solo - Reactive: ping-pong - Reactive: recent - Reactive: inactive - Newcomer	Collaboration

I fitted models for the revisions of the sentences of each article and employed meta-analytic approaches to synthesize the effect sizes of each model across the articles. The decision to fit the model article by article was made to acknowledge that the article operates within its own ecosystem with different flows of engaged editors, popularity, and content (Lerner and Lomi, 2020). For each article, I estimated (1) a revising model that investigates which edits_revising are more likely to occur and (2) a revised model that studies which edits_revised are more likely to survive from being revised. The time studied was the duration between each revision. In the revising model, the entire interval to examine started from the edit prior to the first revision until the last revision. In the revised model, the interval started from the first revision until the end of the study’s data collection, when edits that were not subject to further revision were processed through right-censoring (see Figure 2). The frailty models were estimated using the R package coxme (Therneau, 2015). After excluding the models that failed to converge, primarily because of insufficient data points to identify various effects, the revising models of 478 articles and the revised models of 516 articles (Supplemental material) were valid, including 537 articles with 76,525 sentences modified 210,391 times in 140,593 revisions by 48,643 editors in total (Table 3).

Figure 2.

Visualization of the independent variables (IVs) and dependent variable (DV) in the two frailty models used in the study, illustrated with one article containing a single modified sentence. For each revision of sentence s at a given step i, the pair—the current revision E (s, i) (highlighted in blue) and the preceding revision E (s, i–1) (highlighted in green)—influences the revision hazard.

Table 3.

Overview of the variable statistics.

	Article level (N = 537)					Sentence level (N = 76,525)
	Min	Max	Mean	Median	SD	Min	Max	Mean	Median	SD
General
Revision (Frequency)	19	8382	392	189	608	1	252	3	1	4
Editor (Frequency)	7	1422	146	81	186	1	187	2	1	3
Content	0	79047	906	21	4975	0	86864	2968	99	10124
Time (First revision)	996431097	1632159870	1199912430	1163510277	156510567	996431097	1638855991	1342110091	1305559785	165680679
Editor role (Frequency)
Common editor	4	3239	120	60	207	0	54	1	1	1
Special role	3	3467	138	62	234	0	78	1	1	2
Bot	0	182	26	17	28	0	9	0	0	1
Unregistered	0	1219	90	36	151	0	125	1	0	2
Blocked	0	375	18	7	33	0	58	0	0	1
Editor’s engagement (Frequency)
Newcomer	18	6542	347	175	515	1	187	2	1	3
Solo	1	924	28	13	56	0	27	0	0	1
Ping-pong	0	445	9	2	27	0	32	0	0	0
Recent	0	246	4	1	13	0	10	0	0	0
Inactive	0	225	3	0	12	0	30	0	0	0
Moderation (Frequency)
No moderation	18	5692	301	155	430	0	164	2	1	3
Contestation	0	1913	54	16	124	0	97	0	0	1
Content	0	535	13	4	32	0	34	0	0	0
Conduct	0	295	23	5	47	0	34	0	0	1
Content & Conduct	0	10	0	0	1	0	3	0	0	0
Collaboration (Frequency)
No collaboration	17	7689	379	184	580	0	252	3	1	4
Collaboration	1	693	12	5	35	0	14	0	0	0

The effect sizes of each model were synthesized using the meta (Schwarzer, 2007) and dmetar (Harrer et al., 2019) in R. For each variable, vote counting, random-effects meta-analysis, a between-article heterogeneity test, and meta-regression analysis were conducted. The vote-counting approach provided an overview of the direction and statistical significance of the HRs. To account for the varying characteristics of Wikipedia articles, I employed random-effects models incorporating between-article heterogeneity to pool the effect sizes across the articles. I then conducted between-article heterogeneity tests to estimate how much each effect size varied across the articles. Meta-regression analyses were employed on the factors of articles that could moderate the effect sizes on the revision hazards in the articles as between-article moderators, including article score,⁷ number of revised sentences, number of unique editors, and time of the first revision. Moreover, 31 articles in the dataset were under indefinite semi-protection, so non-autoconfirmed editors whose accounts were less than 4 days old or who had made fewer than 10 edits could not edit the articles. As protection could change the editing dynamics, especially for categories such as editor’s role and moderation (Ajmani et al., 2023; Hill and Shaw, 2015), subgroup analyses were conducted.

Findings

Vote counting and random-effects analysis

Temporal and contentfactors

According to the results (see Tables 4 and 5), the pooled effect sizes of all variables related to time, parity, and content were relatively small (HRs ≈ 1.0), likely due to their large scales. At the sentence level, the revision hazard of the edits_revising was lower when the first revision started later (115 significantly negative cases vs 76 significantly positive cases). The increases in the sentence’s content score of the edits_revising (17 negative cases) and the edits_revised (56 negatives) had lower hazards. For each revision, there were more articles with significantly negative effects in the revision time of the edits_revised (122 negatives) than those with positive effects. An additional revision (parity), for both edits_revised (HR = 0.979, 95% confidence interval [CI]: [0.975, 0.982]; 160 cases) and edits_revising (HR = 0.955, CI: [0.95, 0.96]; 252 cases) was negatively associated with an increase in the revision hazard for most articles.

Table 4.

Vote counting and random-effects models that synthesize the revised models.

	Pooled effect size (HR) [95% CI]	p-value	n_significantly positive/ratio	n_significantly negative/ratio	k
Sentence
Start	1.0 [1.0, 1.0]	.537	26/0.46	31/0.54	516
Content	1.0 [1.0, 1.0]	<.001***	7/0.11	56/0.89	516
Revision
Time	1.0 [1.0, 1.0]	<.001***	26/0.19	112/0.81	516
Parity	0.979 [0.975, 0.982]	<.001***	25/0.14	160/0.86	516
Revision: Editor’s role (Ref: common editors)
Special roles	0.935 [0.908, 0.963]	<.001***	26/0.31	59/0.69	516
Bots	1.149 [1.102, 1.2]	<.001***	48/0.79	13/0.21	513
Unregistered	1.647 [1.589, 1.707]	<.001***	213/0.97	7/0.03	515
Blocked	1.317 [1.245, 1.394]	<.001***	69/0.88	9/0.12	449
Revision: Editor’s engagement (Ref: newcomers)
Solo	0.929 [0.895, 0.965]	<.001***	19/0.31	43/0.69	516
Ping-pong	1.39 [1.293, 1.495]	<.001***	62/0.87	9/0.13	384
Recent	1.189 [1.106, 1.276]	<.001***	21/0.88	3/0.12	314
Inactive	1.149 [1.057, 1.25]	<.01**	9/0.64	5/0.36	160
Revision: Moderation (Ref: no moderation)
Contestation	0.923 [0.89, 0.956]	<.001***	26/0.36	46/0.64	509
Content	0.988 [0.938, 1.041]	.645	22/0.59	15/0.41	430
Conduct	0.873 [0.836, 0.911]	<.001***	21/0.4	31/0.6	440
Content & Conduct	1.391 [1.141, 1.696]	<.01**	8/1.0	0/0.0	111
Revision: Collaboration (Ref: no collaboration)
Collaboration	1.075 [0.997, 1.158]	<.1	24/0.6	16/0.4	516

Note. Significance codes: * = p < .05; ** = p < .01; *** = p < .001.

Table 5.

Vote counting and random-effects models that synthesize the revising models.

	Pooled effect size (HR) [95% CI]	p-value	n_significantly positive/ratio	n_significantly negative/ratio	k
Sentence
Start	1.0 [1.0, 1.0]	<.01**	76/0.4	115/0.6	478
Content	1.0 [1.0, 1.0]	<.05*	9/0.35	17/0.65	478
Revision
Time	1.0 [1.0, 1.0]	.165	106/0.5	105/0.5	478
Parity	0.955 [0.95, 0.96]	<.001***	41/0.14	252/0.86	478
Revision: Editor’s role (Ref: common editors)
Special roles	0.963 [0.932, 0.993]	<.05*	50/0.43	66/0.57	478
Bots	0.687 [0.652, 0.723]	<.001***	7/0.06	110/0.94	476
Unregistered	1.141 [1.105, 1.178]	<.001***	73/0.74	26/0.26	478
Blocked	1.172 [1.11, 1.239]	<.001***	51/0.67	25/0.33	428
Revision: Editor’s engagement (Ref: newcomers)
Solo	3.04 [2.861, 3.235]	<.001***	315/0.98	8/0.02	478
Ping-pong	2.507 [2.326, 2.705]	<.001***	155/0.99	1/0.01	374
Recent	1.733 [1.602, 1.876]	<.001***	67/0.97	2/0.03	306
Inactive	1.696 [1.536, 1.872]	<.001***	33/1.0	0/0.0	158
Revision: Moderation (Ref: no moderation)
Contestation	1.747 [1.677, 1.822]	<.001***	187/0.97	6/0.03	474
Content	1.229 [1.162, 1.298]	<.001***	55/0.83	11/0.17	416
Conduct	2.779 [2.622, 2.945]	<.001***	225/0.99	2/0.01	413
Content & Conduct	2.995 [2.27, 3.951]	<.001***	23/0.96	1/0.04	107
Revision: Collaboration (Ref: no collaboration)
Collaboration	1.037 [0.989, 1.087]	.131	23/0.62	14/0.38	478

Note. Significance codes: * = p < .05; ** = p < .01; *** = p < .001.

Editor’s role

The edits_revised of unregistered editors (HR = 1.647, CI: [1.589, 1.707]; 213 positives), blocked editors (HR = 1.317, CI: [1.245, 1.394]; 69 positives), and bots (HR = 1.149, CI: [1.102, 1.2]; 48 positives) were more likely to be revised than those of common editors. By contrast, the edits_revised submitted by editors with special roles (HR = 0.935, CI: [0.908, 0.963]; 59 negatives) had lower revision risks. Regarding the effect sizes of the editor’s role in the edits_revising, unregistered editors, despite the small overall effect size (HR = 1.141, CI: [1.105, 1.178]; 73 positives), and blocked editors (HR = 1.172, CI: [1.11, 1.239]; 51 positives) were more positively associated with revision hazard, while the bots, with a large overall effect size (HR = 0.687, CI: [0.652, 0.723]; 110 negatives) and editors with special roles (HR = 0.964, CI: [0.932, 0.993]; 66 negatives) primarily had a negative impact on the occurrence of revisions.

Editor’s engagement

All variables in editor engagement of the edits_revising had positive pooled effect sizes and were statistically significant. Compared to the edits_revising by newcomer editors, revision hazards were higher when the edits_revising were made by reactive editors who had previously edited a sentence. Edits_revising by reactive editors, especially solo editors who extended their previous edits to another revision (HR = 3.04, CI: [2.861, 3.235]; 315 positives) or ping-pong-playing editors who modified the edits_revising right before (HR = 2.507, CI: [2.326, 2.705]; 155 positives) showed a highly positive association with revision hazards. The largest effect size of the edits_revised in the category was among ping-pong-playing editors (HR = 1.39, CI: [1.293, 1.495]; 62 positives), while the smallest was among solo editors (HR = 0.929, CI: [0.895, 0.965]; 43 negatives), implying that edits made by solo editors are less likely to be revised.

Contextual dependencies: moderation and collaboration

Edits_revising were more likely to occur when they moderated a previous edit, particularly when this moderation was based on content and conduct policies (HR = 2.995, CI: [2.27, 3.951]; 23 cases), conduct policies (HR = 2.779, CI: [2.622, 2.945]; 225 positives), or general contestation (HR = 1.747, CI: [1.677, 1.822]; 187 positives). Revision hazard was also positively related to the edits_revising that revised previous edits violating content policies (HR = 1.229, CI: [1.162, 1.298]; 55 cases). By contrast, the effects of contestation in the edits_revised were not commonly found in most articles, where the pooled effect size of the moderation based on content policies (p = .645) was not significant. While edits_revised that were modified due to contestation (HR = 0.923, CI: [0.89, 0.956]; 46 negatives) and conduct-policy-related moderation (HR = 0.873, CI: [0.836, 0.911]; 31 negatives) had lower revision hazards, those moderated based on content and conduct policies had higher revision hazards (HR = 1.391, CI: [1.141, 1.696]; 8 positives). Collaboration-related edits were not significantly associated with revision hazard.

Heterogeneity analysis, meta-regression, and subgroup analysis

The between-article heterogeneity tests, which assessed the variation between the effect size of each variable, showed that effect size heterogeneity was statistically significant for all variables, with certain variables’ effects greatly depending on specific article-level characteristics (Supplemental material). Moderator analysis was conducted on the factors of articles that could moderate the effect sizes on the revision hazards in the investigated articles as between-article moderators, including article score, number of revised sentences, number of unique editors, and time of the first revision (Supplemental material). Only article score did not moderate any between-article heterogeneity in effect size estimates. Moreover, the subgroup analysis showed that the effect difference between protected articles and non-protected articles was significant for multiple variables (Supplemental material). Notably, the risks of revisions for edits_revised related to contestation and conduct-policy enforcement or for edits_revised made by special role editors or bots were smaller in protected articles than in non-protected articles, while revisions were more likely to occur due to the contestation of edits_revising in protected articles than in non-protected articles.

Summary

Across the 537 articles, the statistically significant pooled effect sizes were more common in the revising models (examining which edits_revising are likely to occur) than in the revised models (analyzing which edits_revised are likely to persist). These effect sizes of the edits_revising mainly fall into two category groups: editors’ engagement and contextual dependencies. Regarding editors’ engagement, solo and ping-pong-playing editors were more likely to contribute to revisions than newcomer editors. While newcomer editors who modified sentences only once were the most pervasive, their associated revision hazard was low. Regarding contextual dependencies, revision hazard was higher when edits_revising were conducted due to violated conduct policies or both content and conduct policies.

The edits_revised of reactive editors were positively associated with revision hazard. The moderation-based edits_revised, excluding those driven by content and conduct policy enforcement, were negatively related to revision hazard, suggesting that conflicts are relatively uncommon and do not extend over edit sessions. The category group editor’s role was crucial in the models of edits_revised, as blocked and unregistered editors, whose edits were more likely to be revised, destabilized collective knowledge construction. By contrast, edits_revised by special role editors had lower revision risks. This echoes previous Wikipedia research (Shaw and Hill, 2014) on power concentration showing that experienced editors make more reliable edits that are unlikely to be revised. However, special role editors or bots as characteristics of the edits_revising did not necessarily increase the likelihood of revision. Compared to common editors, edits made by bots had a higher risk of being revised.

The association between collaboration and revision hazards was not statistically significant in the models of edits_revised or edits_revising. This supports previous research (Bipat et al., 2018; Morgan et al., 2013; Viegas et al., 2007) indicating that indirect coordination is more common on Wikipedia’s talk pages. Edits with collaborative comments that serve as a starting point pointing to the “talk” page or because of negotiations on the talk page are less often found than the moderated edits conducted independently. Regarding temporal and content factors, revisions tended to occur when the first revision of the sentence started late or when content relevance was higher. By contrast, revisions were less likely to occur when the edits_revised were conducted later or frequently.

Conclusion and discussion

In response to Faraj et al.’s (2011: 1235) call to examine “the connections of ideas along the flow of people” in knowledge collaboration, I explored the challenge of understanding how collective knowledge is shaped and maintained on Wikipedia by shifting from an actor-centric focus to a sentence-level analysis of revision mechanisms. By considering modified sentences as carriers of collective knowledge, I investigated how individual pieces of knowledge evolve within Wikipedia’s ecosystem. I identified and analyzed the editorial, contextual, content, and temporal factors that influence the revision mechanisms by employing survival analysis and meta-analysis to model their impact.

The findings highlight tensions between fostering a self-organized culture and managing the project efficiently and coherently, with responses emerging through peer and bureaucratic control. On the one hand, conflicts arise among editors over how knowledge on Wikipedia should be shaped, as revisions are mainly triggered by moderation and facilitated by reactive editors consistently monitoring revisions. On the other hand, Wikipedia’s open model still ensures the stability and maintenance of knowledge through its revision mechanisms. While some revisions stem from general contestations, many are driven by efforts to regulate collective knowledge through established community policies and automated tools. Disruptive edits, such as those of blocked editors and those that violate community-defined rules, are effectively identified and corrected. Content can remain stable even amid frequent revisions, as edits that have previously been moderated or contested are less likely to be revised.

Epistemic power in the revision processes primarily resides in and is negotiated through collective endeavors of mutual observation, iterative refinement, and community moderation. Editors’ engagement in revising sentences, particularly through solo and ping-pong-style revisions, significantly impacts subsequent revisions. This highlights the presence of self-regulation and editorial oversight. Meanwhile, editors’ roles are not as crucial as their engagement or moderation when prompting revisions. Although edits made by elite editors have a lower risk of being revised, revisions are not necessarily driven by them. This aligns with previous research (Jemielniak, 2014; Pentzold, 2017), which suggests that while elite editors may possess more power, this power’s scope is limited. Therefore, the epistemic power in Wikipedia is not concentrated in the hands of a few elite editors with special roles but rather distributed and negotiated through collective processes enforced by highly committed editors who constantly surveil their knowledge areas, ready to engage in conflicts and contestations that shape the very fabric of collective knowledge.

When reflecting on climate change as a case, it is unsurprising that climate change, mitigation policies, and energy-related issues are the most relevant topics when editors disputed their revisions. However, the association between revision and content relevance at the sentence level was only slightly positive in a few instances. Similarly, content relevance at the article level did not mediate any of the variables. Accounting for the impacts of moderation-driven revisions and prior editor engagement revealed that Wikipedia’s knowledge collaborations are shaped less by subject-specific concerns and more by the broader imperative to converge collective knowledge in line with the shared understanding of engaged editors at a given moment. This dynamic suggests that the process of knowledge construction is driven more by ongoing community moderation than by the inherent relevance of the content itself. As for generalizability, this study’s analysis of a large-scale Wikipedia dataset related to climate change confirms several previous findings (Esteves Gonçalves da Costa and Cukierman, 2019; Weltevrede and Borra, 2016). Given its broad scope spanning a diverse range of articles that contribute to communicating climate change and other knowledge areas, this study provides insights into collaborative knowledge patterns that may be generalizable to other peer-based knowledge production contexts. However, caution is warranted when extrapolating the study’s results, as Wikipedia’s editor base is heterogeneous in expertise and commitment. In addition, the ideas, involved editors, and stages of collaboration also contribute to varied revision mechanisms. Indeed, heterogeneity tests among article models suggest that each article functions within its own unique ecosystem, underscoring the importance of considering context-specific factors when analyzing knowledge collaborations.

The findings further suggest that contestation and conflict over collective knowledge building are not endless, even in highly contentious subjects like climate change. As argued by McDowell and Vetter (2021), stable consensus can emerge through continuous moderation efforts guided by policies and rules and with millions of viewers engaging with the content. Unlike social media platforms, where epistemic power is often shaped by social networking, influential users, or viral content, Wikipedia’s power dynamics are rooted in the iterative process of constructing and moderating ideas, shaped, reconciled, and revised through an ongoing convergence of participating editors (Faraj et al., 2011). Despite the presence of a hierarchical structure among editors based on their commitment and previous contributions, power is not centralized. Instead, it is broadly distributed across accountability networks of community moderation (Nahon, 2011; Pentzold, 2017).

Meanwhile, it is essential to recognize the variability and diversity of knowledge on Wikipedia and the assemblage that constructs it. The blurring boundaries between citizen and professional, the engendering temporary roles of contributors, the dissipating context of knowledge, and the fluidity of organizational structures (Faraj et al., 2011; Neuberger et al., 2023) all contribute to multifaceted revision mechanisms that maintain resilience against vandalism and misinformation. Resilience emerges from a complex interplay of knowledge, actors, rules, algorithmic tools, and technological systems. However, systemic biases and inclusion challenges persist, as Wikipedia primarily relies on secondary sources that may already reflect existing inequalities. These issues are further amplified by the composition of its editing community, which influences what knowledge is included or excluded (Martini, 2024; McDowell and Vetter, 2021). As Wikipedia continues to serve as a foundational source for large language model training and remains widely cited across social media as an authoritative reference, examining how its representation of knowledge is constructed and maintained becomes crucial. By examining the intricacies of knowledge collaboration and the underlying power dynamics, we can better evaluate Wikipedia’s resilience, adaptability, and broader impact on shaping collective understanding of the world.

Limitations

To advance future research on revision mechanisms in Wikipedia, it is crucial to acknowledge the limitations of this study. First, while it identified key variables influencing revision mechanisms, knowledge collaboration on Wikipedia remains highly complex and multifaceted; the identified factors cannot fully capture the intricate mechanisms underlying revisions. Editor categorization could be refined by, for example, distinguishing between superusers or task forces from specific projects, as well as incorporating other nuanced dimensions, while taking the uncorrelatedness of the variables into account. Second, the inference of editorial context related to moderation and collaboration was based solely on edit summaries. The full scope of discussions and coordination on the talk page was not adequately represented. Future research could leverage large language models and text-mining technology to establish connections between discussions on the talk page and resulting revisions and to identify revisions driven by indirect coordination. Third, the computational requirements of the frailty model, combined with the large dataset’s size and nested structure, posed challenges for data analysis. To manage this complexity, I employed a two-step approach: first fitting frailty models to each article and then applying meta-analysis to synthesize the results. However, efficient frailty analysis that can accommodate multi-level analysis in large-scale datasets remains an area of exploration. Finally, the methods used in this study did not fully explore the variability between individual Wikipedia articles. Researchers could employ clustering analysis to group individual articles’ models based on shared characteristics, such as similar term score distributions, and compare the revision mechanisms across distinct article groups to uncover further patterns. In addition, qualitative research focusing on specific sentences and topics that undergo frequent revisions or have high content scores will provide nuanced insights into content negotiations and epistemic debates.

Supplemental Material

sj-pdf-1-nms-10.1177_14614448251336418 – Supplemental material for Decoding revision mechanisms in Wikipedia: Collaboration, moderation, and collectivities

Supplemental material, sj-pdf-1-nms-10.1177_14614448251336418 for Decoding revision mechanisms in Wikipedia: Collaboration, moderation, and collectivities by Xixuan Zhang in New Media & Society

Footnotes

Acknowledgements

I would like to thank my advisors and colleagues for their valuable comments and suggestions. I am also grateful to Angelika Juhász and Svea Komm for their dedicated work in annotating the dataset, and to the HPC Service of FUB-IT, Freie Universität Berlin, for providing computing time. I extend my sincere thanks to the reviewer for their constructive feedback and thoughtful suggestions, which significantly improved this article.

Funding

The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Federal Ministry of Education and Research of Germany, funding code 16DII125.

ORCID iD

Xixuan Zhang

Supplemental material

Supplemental material for this article is available online:

Notes

Author biography

Xixuan Zhang is a research associate and doctoral candidate at the Freie Universität Berlin and the Weizenbaum Institute for the Networked Society. She previously studied Media and Political Communication at Freie Universität Berlin and Media Informatics at Technische Universität Berlin. Her research explores digital activism, online discourse, and the networked public sphere, with a particular interest in computational methods.

References

Ajmani

Vincent

Chancellor

(2023) Peer produced friction: how page protection on Wikipedia affects editor engagement and concentration. Proceedings of the ACM on Human-Computer Interaction 7(CSCW2): 1–33.

Anthony

Smith

Williamson

(2009) Reputation and reliability in collective goods: the case of the online encyclopedia wikipedia. Rationality and Society 21(3): 283–306.

Arazy

Daxenberger

Lifshitz-Assaf

, et al. (2016) Turbulent stability of emergent roles: the dualistic nature of self-organizing knowledge coproduction. Information Systems Research 27(4): 792–812.

Arazy

Lindberg

Lev

, et al. (2020) Emergent routines in peer-production: examining the temporal evolution of Wikipedia’s work sequences. ACM Transactions on Social Computing 3(1): 1–24.

Arazy

Ortega

Nov

, et al. (2015) Functional roles and career paths in Wikipedia. In: Proceedings of the 18th ACM conference on computer supported cooperative work & social computing. Vancouver, BC, Canada, 14–18 March 2015, pp. 1092–1105. New York: Association for Computing Machinery.

Balan

Putter

(2020) A tutorial on frailty models. Statistical Methods in Medical Research 29(11): 3424–3454.

Benkler

(2006) The Wealth of Networks: How Social Production Transforms Markets and Freedom. New Haven, CT: Yale University Press.

Beschastnikh

Kriplean

McDonald

(2021) Wikipedian self-governance in action: motivating the policy lens. Proceedings of the International AAAI Conference on Web and Social Media 2(1): 27–35.

Bipat

McDonald

Zachry

(2018) Do we all talk before we type? Understanding collaboration in Wikipedia language editions. In: Proceedings of the 14th international symposium on open collaboration, Paris, 22–24 August, 2018, pp. 1–11. New York: Association for Computing Machinery.

10.

Borra

Weltevrede

Ciuccarelli

, et al. (2014) Contropedia—The analysis and visualization of controversies in Wikipedia articles. In: Proceedings of the international symposium on open collaboration, Berlin, 27–29 August 2014. Available at: https://www.researchgate.net/publication/278030058

11.

Bruckman

(2022) Should You Believe Wikipedia? Online Communities and the Construction of Knowledge. Cambridge: Cambridge University Press.

12.

Bürger

Schlögl

Schmid-Petri

(2023) Conflict dynamics in collaborative knowledge production. A study of network gatekeeping on Wikipedia. Social Networks 72: 13–21.

13.

Butler

Joyce

Pike

(2008) Don’t look now, but we’ve created a bureaucracy: the nature and roles of policies and rules in Wikipedia. In: Proceedings of the SIGCHI conference on human factors in computing systems, Florence, 5–10 April 2008, pp. 1101–1110. New York: Association for Computing Machinery.

14.

Champion

Hill

(2024) Countering underproduction of peer produced goods. New Media & Society. Epub ahead of print 16 May 2024. DOI: 10.1177/14614448241248139.

15.

Esteves Gonçalves da Costa

Cukierman

(2019) How anthropogenic climate change prevailed: a case study of controversies around global warming on Portuguese Wikipedia. New Media & Society 21(10): 2261–2282.

16.

Faraj

Jarvenpaa

Majchrzak

(2011) Knowledge collaboration in online communities. Organization Science 22(5): 1224–1239.

17.

Halfaker

Geiger

Morgan

, et al. (2013) The rise and decline of an open collaboration system: how Wikipedia’s reaction to popularity is causing its decline. American Behavioral Scientist 57(5): 664–688.

18.

Halfaker

Kittur

Riedl

(2011) Don’t bite the newbies: how reverts affect the quantity and quality of Wikipedia work. Proceedings of the 7th international symposium on wikis and open collaboration, Mountain View, CA, 3–5 October 2011, pp. 163–172. New York: Association for Computing Machinery.

19.

Harrer

Cuijpers

Furukawa

, et al. (2019) Dmetar: companion R package for the guide “Doing meta-analysis in R” (R Package Version 0.0, 9000). DOI: 10.1201/9781003107347.

20.

Heylighen

(2016) Stigmergy as a universal coordination mechanism I: definition and components. Cognitive Systems Research 38: 4–13.

21.

Hill

Shaw

(2015) Page protection: another missing dimension of Wikipedia research. In: Proceedings of the 11th international symposium on open collaboration, San Francisco, CA, 19–21 August 2015, pp. 1–4. New York: Association for Computing Machinery.

22.

Hwang

Shaw

(2022) Rules and rule-making in the five largest Wikipedias. Proceedings of the international AAAI conference on web and social media 16: 347–357.

23.

Jemielniak

(2014) Common Knowledge? An Ethnography of Wikipedia, Redwood City, CA: Stanford University Press.

24.

Keegan

Lev

Arazy

(2016) Analyzing organizational routines in online knowledge collaborations: a case for sequence analysis in CSCW. Proceedings of the 19th ACM conference on computer-supported cooperative work & social Computing. San Francisco, CA, 27 February–2 March 2016, pp. 1065–1079. New York: Association for Computing Machinery.

25.

König

(2013) Wikipedia: between lay participation and elite knowledge representation. Information, Communication & Society 16(2): 160–177.

26.

Lerner

Lomi

(2020) The free encyclopedia that anyone can dispute: an analysis of the micro-structural dynamics of positive and negative relations in the production of contentious Wikipedia articles. Social Networks 60: 11–25.

27.

Maki

Yoder

, et al. (2017) Roles and success in Wikipedia talk pages: identifying latent patterns of behavior. In: Kondrak

Watanabe

(eds) Proceedings of the Eighth International Joint Conference on Natural Language Processing (Vol. 1: Long Papers). Taipei, Taiwan: Asian Federation of Natural Language Processing, pp. 1026–1035.

28.

Martini

(2024) Notable enough? The questioning of women’s biographies on Wikipedia. Feminist Media Studies 24(8): 1877–1893.

29.

Matias

(2019) The civic labor of volunteer moderators online. Social Media+ Society 5(2). DOI: 10.1177/2056305119836778.

30.

McDowell

Vetter

(2021) Wikipedia and the Representation of Reality. New York: Routledge.

31.

Morgan

Gilbert

Zachry

, et al. (2013) A content analysis of wikiproject discussions: toward a typology of coordination language used by virtual teams. In: Proceedings of the 2013 conference on computer supported cooperative work companion, San Antonio, TX, 23–27 February 2013, pp. 231–234. New York: Association for Computing Machinery.

32.

Nahon

(2011) Network theory fuzziness of inclusion/exclusion in networks. International Journal of Communication 5: 17.

33.

Neuberger

Bartsch

Fröhlich

, et al. (2023) The digital transformation of knowledge order: a model for the analysis of the epistemic crisis. Annals of the International Communication Association 47(2): 180–201.

34.

Niederer

Van Dijck

(2010) Wisdom of the crowd or technicity of content? Wikipedia as a sociotechnical system. New Media & Society 12(8): 1368–1387.

35.

Osman

(2013) The role of conflict in determining consensus on quality in Wikipedia articles. In: Proceedings of the 9th international symposium on open collaboration, Hong Kong, China, 5–7 August 2013, pp. 1–6. New York: Association for Computing Machinery.

36.

Pentzold

(2017) Editorial surveillance and the management of visibility in peer production. International Journal of Communication 11: 20.

37.

Pfister

(2011) Networked expertise in the era of many-to-many communication: on Wikipedia and invention. Social Epistemology 25(3): 217–231.

38.

Ren

Yan

(2017) Crowd diversity and performance in Wikipedia: the mediating effects of task conflict and communication. In: Proceedings of the 2017 CHI conference on human factors in computing systems, Denver, CO, 6–11 May 2017, pp. 6342–6351. New York: Association for Computing Machinery.

39.

Ren

Zhang

Kraut

(2023) How did they build the free encyclopedia? A literature review of collaboration and coordination among Wikipedia editors. ACM Transactions on Computer-Human Interaction 31(1): 1–48.

40.

Rijshouwer

Uitermark

de Koster

(2023) Wikipedia: a self-organizing bureaucracy. Information, Communication & Society 26(7): 1285–1302.

41.

Sanh

Debut

Chaumond

, et al. (2019) DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXiv, abs/1910.01108.

42.

Schwarzer

(2007) meta: An R package for meta-analysis. R News 7(3): 40–45.

43.

Shaw

Hill

(2014) Laboratories of oligarchy? How the iron law extends to peer production. Journal of Communication 64(2): 215–238.

44.

Therneau

(2015) Mixed effects Cox models. CRAN Repository. Available at: https://doi.org/10.32614/cran.package.coxme

45.

Viegas

Wattenberg

Kriss

, et al. (2007) Talk before you type: coordination in Wikipedia. In: Proceedings of the 40th annual Hawaii international conference on system sciences, Waikoloa, HI, p. 78. New York: IEEE.

46.

Weltevrede

Borra

(2016) Platform affordances and data practices: the value of dispute on Wikipedia. Big Data & Society 3(1). DOI: 10.1177/2053951716653418.

47.

Wikipedia Contributors (2023) Wikipedia: wikiProject climate change. Available at: https://en.m.wikipedia.org/wiki/Wikipedia:WikiProject_Climate_change

48.

Wikipedia Contributors (2024) Wikipedia: wikipedians. Available at: https://en.wikipedia.org/wiki/Wikipedia:Wikipedians

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

3.09 MB