Abstract
While research has explored the extent of gender bias and the barriers to women's inclusion on English-language Wikipedia, very little research has focused on the problem of racial bias within the encyclopedia. Despite advocacy groups' efforts to incrementally improve representation on Wikipedia, much is unknown regarding how biographies are assessed after creation. Applying a combination of web-scraping, deep learning, natural language processing, and qualitative analysis to pages of academics nominated for deletion on Wikipedia, we demonstrate how Wikipedia's notability guidelines are unequally applied across race and gender. We find that online presence predicts whether a Wikipedia page is kept or deleted for white male academics but that this metric is idiosyncratically applied for female and BIPOC academics. Further, women's pages, regardless of race, were more likely to be deemed “too soon” for Wikipedia. A deeper analysis of the deletion archives reveals that when the tag is used on a woman's biography it is done so outside of the community guidelines, referring to one's career stage rather than media/online coverage. We argue that awareness of hidden biases on Wikipedia is critical to the objective and equitable application of the notability criteria across race and gender both on the encyclopedia and beyond.
Keywords
Introduction
On October 17, 2022, The Washington Post spotlighted the work of Jess Wade, a British physicist who spends her spare time trying to close Wikipedia's gender gap (Page, 2022). The article highlighted the central role volunteers like Wade play in addressing gender and racial disparities on Wikipedia, but obscured a less visible and understood problem—what happens to the biographies after they are added?
Take, for example, an article written by Wade on April 10, 2019, about an American computer scientist and engineer who led the development of algorithms used to capture the first image of a black hole (Elfrink, 2019). Only two days after creation, the page was flagged for deletion. Despite extensive international press coverage of this scientist in publications like The New York Times, CNN, and The Washington Post (Bever, 2019; Boyer, 2019; Koren, 2019; Lou and Ahmed, 2019; Maguire, 2019; Mervosh, 2019), she was deemed “too soon” for a Wikipedia page because “an assistant professor is clearly not notable as a scientist.” Other editors quickly responded, noting that she had received significant press coverage from national news outlets. Eventually, her page received a “snow keep” decision, indicating that her notability might be questionable but that deleting her page would be too much of an uphill battle to pursue (WP:SNOW). This situation is not uncommon: our data indicate that classifying female scientists with in-depth media coverage as “too soon” based on their career stage is indicative of a broader pattern regarding how women's notability is assessed and the extent to which Wikipedia guidelines are equitably interpreted and applied.
Gender inequality on Wikipedia has been well documented (Adams et al., 2019; Ford and Wajcman, 2017; Jemielniak, 2014; Konieczny and Klein, 2018; Menking and Erickson, 2015; Menking et al., 2019; Press and Tripodi, 2021; Reagle and Rhue, 2011). Women represent less than 20% of all English-language biographies (WP:WikiProject Women in Red; Tripodi, 2021). Female academics are less likely to be recognized on Wikipedia than their male counterparts across all fields of study (Luo et al., 2018; Schellekens et al., 2019; Vitulli, 2017). Women's pages display a negative linguistic bias, and the hyperlink structure of women's pages puts them at a disadvantage in terms of page visibility and traffic compared to men's (Wagner et al., 2016).
To try and improve gender representation on the site, advocacy initiatives (e.g., Women in Red and Art + Feminism) regularly organize groups of new and existing editors to write and improve articles about women. These editing events aim to inspire new writers to create more biographies about women and members of other minority groups to try and close the documented “gender gap.” Despite the success, many women explain that safety concerns hinder their desire to dedicate the time necessary to become a “Wikipedian” (Bryant et al., 2005; Menking et al., 2019). This effort has further been stymied by targeted deletions of biographies of women and people of color (Kramer, 2019). These deletions emphasize the need to deepen existing research on Wikipedia's gender bias as well as its understudied racial bias (Smith, 2015). While recent work has uncovered that women who meet Wikipedia's criteria for inclusion are more likely to be nominated for deletion than men (Tripodi, 2021), a more nuanced understanding of the guidelines used to make decisions on who is “notable” enough for Wikipedia is needed. The bias hidden within decision making on Wikipedia impedes the ability of the encyclopedia to truly deliver “the sum of all human knowledge.”
To address this gap, we explored how metrics used to assess notability on Wikipedia (WP:Search Engine Test; “Too Soon”) are applied across biographies of academics. To do so, we first web-scraped biographies of academics nominated for deletion from 2017 to 2020 (n = 843). Next, we created a numerical proxy for each subject's online presence score. This value is meant to emulate Wikipedia's “Search Engine Test,” (WP:Search Engine Test) a convenient and common way editors can determine probable notability before nominating a biography for deletion. By creating a numerical value, we could study if online presence scores were a meaningful predictor of establishing notability and whether a person's perceived gender or race influenced this notability guideline. We also conducted a qualitative analysis of the discussions surrounding deleted biographies labeled “Too Soon,” (WP:Too soon). Doing so allowed our research team to assess if gender and/or racial discrepancies existed in deciding whether a biography was considered notable enough for Wikipedia.
We find that both metrics are implemented idiosyncratically. Our data also indicate that a robust network to monitor new pages added to the site must exist so that pages can be “saved” from deletion. Such findings indicate that notability guidelines are being applied subjectively, creating an environment where implicit biases drive inclusion decisions.
Literature review
Most research on Wikipedia inequality focuses on gender inequality. This dynamic scholarship tends to revolve around a few themes: (1) unequal representation in terms of content and quality, (2) too few women editors and an unhealthy working environment, and (3) disparities in notability criteria and/or the assessment of women's articles.
Content and quality
Of the more than 1.5 million biographies about notable writers, inventors, and academics on English-language Wikipedia, less than 20% are about women (WP:WikiProject Women in Red; Tripodi, 2021). While some might argue that this is indicative of lower achievement, external audits have found that notable women from a variety of fields are missing from the site (Adams et al., 2019; Luo et al., 2018; Reagle and Rhue, 2011). Further, Wikipedia page presence does not correlate to other measures of academic achievements, such as rank, length of career, H-index, and departmental reputation (Adams et al., 2019; Samoilenko and Yasseri, 2014; Schellekens et al., 2019).
Not only are women underrepresented, but articles about women's interests are also underdeveloped (Callahan and Herring, 2011; Lam et al., 2011; Wagner et al., 2016). Women's pages are less likely to have embedded hyperlinks, whereas men's pages are better connected (Wagner et al., 2016). Hyperlinking to other Wikipedia pages is extremely important because internal links work like “magnets,” attracting more editors to improve and expand on the pages (Aaltonen and Seiler, 2014). Moreover, hyperlinking a woman's page to an existing Wikipedia page increases its chances of survival (Vitulli, 2017).
Broken pipelines and safety concerns
Some point to the editorial “gender gap” as a driver for Wikipedia's diversity problem (Torres, 2016). Wikipedia editors are over 87% male, according to a Wikimedia Foundation Survey from 2011 (Wikimedia Foundation, 2018), and attempts by Wikimedia to increase editor representation have been unsuccessful (Vitulli, 2017). Research examining this imbalance has found that the “pipeline” for participation is broken because the interface and environment are not particularly accessible (Hargittai and Shaw, 2015; Jemielniak, 2014; Shaw and Hargittai, 2018).
Despite Wikimedia's work to roll out a visual editor that provides direct editing capabilities to make it easier to edit, women still might not feel safe editing Wikipedia. Research documents how women carefully consider which topics they want to write about so as to avoid harassment (Menking et al., 2019; Menking and Erickson, 2015; Press and Tripodi, 2021). Edit-a-thons might offer a path toward new editor participation (March and Dasgupta, 2020), but a hostile environment deters women from their continued participation as there is little recourse when faced with online sexism (Bear and Collier, 2016; Eckert and Steiner, 2013; Gauthier and Sawchuk, 2017; Jemielniak, 2014; Lam et al., 2011; MacAulay and Visser, 2016; Menking and Erickson, 2015; Paling, 2015; Peake, 2015). The need to create safe spaces also deters women editors from participating in Wikipedia discussions when articles are nominated for deletion because it requires a “taxing level of emotional labor” (Menking and Erickson, 2015: 209). Indeed, if Wikipedia wishes to address this gap, it is clear they must address the “underlying culture” required to foster a more inclusive space for women (Menking et al., 2019).
Notability criteria: who stays and who goes
Wagner et al. (2016) suggest that a “higher bar” is set for women subjects on Wikipedia. Wikipedia's “Notability Criteria” were created to support the process of deciding whether a topic deserves a Wikipedia page (WP:Notability). These criteria purposefully leave room for subjectivity and have already been identified as potentially problematic when it comes to reducing gender inequality on the site (Luo et al., 2018; Luyt, 2012). Notability criteria hinge on a subject having significant, independent, coverage by reliable sources—most of which is found online (Luyt, 2012; Luyt and Tan, 2010). For academic biographies on Wikipedia, notability is achieved through the significant impact of one's scholarly work on society, the winning of prestigious academic awards, or the holding of important leadership positions at an academic institution or academic journal board (Gauthier and Sawchuk, 2017; Luo et al., 2018; Matei and Dobrescu, 2011). If an article does not meet the notability criteria, it can be nominated for deletion. Each page nominated for deletion has an associated Articles for Deletion (AfD) discussion page (WP: Articles for Deletion). The AfD page serves as a communication and decision-making platform for administrators to gather and debate whether a topic meets the notability criteria (Schneider et al., 2012).
However, classical aspects of notability are notoriously difficult to achieve for groups that have been historically marginalized in society (Luo et al., 2018; Vitulli, 2017). Even if a subject has made significant contributions to academic history, a lack of documentation on women's achievements in the past prevents their existence in the future on Wikipedia (Harrison, 2019; Luo et al., 2018; Wade and Zaringhalam, 2018). Some female academics have had their pages nominated for deletion for not meeting notability criteria surrounding “significant coverage” and “reliable sources” because mainstream coverage of women professionals is limited (Harrison, 2019; Luo et al., 2018).
Nonetheless, even women who meet Wikipedia's stringent guidelines for inclusion might still be nominated for deletion. Research suggests that gender bias impacts the application of notability criteria in the deletion process (Gauthier and Sawchuk, 2017; Kramer, 2019; Vitulli, 2017). Tripodi (2021) found that women who meet Wikipedia's threshold for inclusion are more likely to be categorized as nonnotable and nominated for deletion when compared to men's biographies. Research thus indicates that notability discrepancies surrounding gender abound, but we were unable to locate any systematic study investigating what these criteria for assessing notability might look like and the extent to which they are being helpful or harmful. Do processes like a “search engine test” (WP:Search Engine Test) make women's contributions more visible, decreasing the likelihood of “miscategorization” (Tripodi, 2021)? More importantly, to date, we found no study that attempts to understand how one's perceived racial identity impacts perceived notability.
Intersectional axes
Existing Wikipedia bias research has almost exclusively focused on gender, ignoring racial and ethnic inequality (White, 2018). However, racism on Wikipedia has been documented. Take, for example, the biography of the Black, female nuclear chemist Clarice Phelps. Her page was deleted three times in the span of one week, accompanied by heated arguments among members of the Wikipedia community (Sadeque, 2019). Exploring the role racial discrimination plays in Wikipedia content will move us toward an understanding of how “axes of oppression” (Collins, 1990), impact representation. Intersectionality, as described by Smith (1994) and Crenshaw (1989), can reveal complex interconnections between race and gender. Specifically, we wanted to explore the social processes behind racial classification, and how the social conditions under which race operates serve to reify “subordinate and superordinate groups” (Zuberi, 2001) To do so, we explore how two key metrics of establishing notability on Wikipedia—online presence and press coverage (e.g., WP:Search Engine Test and WP:Too soon )—are applied to biographies of academics and the extent to which this criterion varies in relation to the subject's perceived racial and gender identity.
Data and methods
To determine if online presence is equitably applied across race and gender, we designed a mixed methods study drawing on web-scraped metadata, machine learning models, natural language processing software, and qualitative content analysis.
Web-scraped metadata
AfD is a process through which Wikipedians can determine whether an article meets the standards for inclusion in the encyclopedia. The AfD archive is a searchable database of every Wikipedia article nominated for deletion. To gather metadata, articles nominated for deletion from November 2017 to August 2020 (n = 843) were scraped from the AfD archive. To extract information about nominated articles, a script was written to parse through the “Academics and Educators” category of the collected AfD archive, gathering the name, date nominated, and links to the AfD discussion page of every article in the time range. If the subject specialized in an academic field, articles were determined to be biographies of academics.
Another script was then used to search for the corresponding Wikidata of the academics in our dataset. Wikidata is a database that contains structured information about an academic's gender, occupation, and employer. If data about an individual's gender or employment were not found on Wikidata, those factors were determined manually by the research team after looking for gender pronouns and career information associated with the individual's Google search hits. There were no observed non-binary individuals in the dataset. There was one identified transgender academic in the dataset, who was removed as their transition within the years of 2017–2020 made it difficult to assess how their notability was impacted by their gender identity. After exporting to a final CSV file, the data were manually cleaned and coded in Excel for statistical analysis. A total of 843 biographical entries were included in our final analysis.
Race and ethnicity prediction models
The authors draw from sociological theory, which states that racial categories are not inherent or endogenous, but rather constructed around stereotypes meant to justify inequality (DuBois, 1940; Omi and Winant, 1994; Roberts, 2011; Roth, 2016; Sims et al., 2019; Zuberi, 2001). There is no single characteristic that belongs to a racial group, it is a “biological notion of physical difference grounded in ideology” (Zack, 1995; Zuberi, 2001: xvii). Nonetheless, race is not imaginary, it is a very real political grouping that has actual consequences for people's opportunities in life (Roberts, 2011). A person's perceptions of one's race contribute to broader patterns of inequality in society as the very meaning of race is dependent on the social conditions in which it is embedded (Zuberi, 2001). For example, resumes with “white-sounding names” were more likely to receive call-backs than identical resumes with “African-American-sounding names” (Bertrand and Mullainathan, 2004; Kline et al., 2022). Our approach to using racial or ethnic categorization is not an attempt to assign race, but rather emulate the social conditions in which race exists on Wikipedia to capture the “racial lens” through which biographies about academics are viewed (Roberts, 2011). Like other audit studies, our research is trying to model human heuristics (Berterolo et al., 2020) to determine how implicit (and explicit) biases impact a person's perceived notability.
To evaluate the relationship between biases and perceived notability, we used a machine learning model that is publicly available using the Python package ethnicolr, developed by Sood and Laohaprapanon (2018). We trained the model on three datasets: Wikipedia data, the 2017 Florida voter registration file, and the 2000 and 2010 American census data. The Wikipedia dataset was built by Ambekar et al. (2009) using decision trees and hidden Markov models. The 2017 Florida voter registration dataset and the census dataset were built by Sood and Laohaprapanon (2018). The precision of race prediction by the model using the Florida dataset was 0.83, that of the census dataset was 0.78, and that of the Wikipedia dataset was 0.73. We applied each model to our dataset to obtain a total of three racial predictions per academic in the dataset. We then manually evaluated individuals for whom there were discrepancies to try and replicate the process that a Wikipedia administrator may use to perceive another's racial category (e.g., names and racial presentation). An important limitation with this method is that the census and Florida voter registration datasets use racial and ethnic categorizations that are rooted in colonization, slavery, and the assimilation pressures faced by people of color and Indigenous peoples. The very existence of these statistics embodies what Zuberi (2001) refers to as “racial reification” (p. 34) in its attempt to convert abstract concepts of racial difference into a system of racial classification. The historical evolution of whiteness as an identity is accompanied by privilege and entitlements that lead to modern-day barriers to effective change with regards to racial discrimination and violence (Harris, 1993). As such, our tool is not meant to reproduce such atrocities, nor is the predictive model meant to assume that race is an “essential category.” Rather, the goal of this paper is to demonstrate how “markers” of race are fluid and contingent on a variety of factors like hairstyle or skin (MacLin and Malpass 2001) as well as the race of whom is perceiving another's race (Sims et al., 2019). Thus, this tool is meant to model racial categorization as an interactive process and is not assumptive of the racial identity that the academics in question might identify. Our process is trying to demonstrate how the realness of race (i.e., oppression) is only feasible through social interactions (Harris, 1993; Markus and Moya, 2010; Roberts, 2011; Sims et al., 2019, Zuberi, 2001) so that we can better understand how one's perceived racial identity might impact their perceived notability on Wikipedia.
Most articles in our dataset were biographies of white men, while the smallest category was comprised of biographies of Black, Indigenous, and people of color (BIPOC) women (Figure 1). To provide enough statistical power to determine how one's perceived race intersects with notability criteria on Wikipedia, we combined biographies about BIPOC subjects into one group (further addressed in limitations).

Gender and racial composition of Wikipedia biographies of academics nominated for deletion between 2017 and 2020 (total, n = 843).
Online presence notability metric
To determine if WP:Search Engine Test is being equitably utilized, we developed a proxy to model what a search for a subject might yield. This numerical index was developed using a natural language processing software called Primer (Primer.ai; Vincent, 2018). In 2018, Primer created software specifically to increase the ease of Wikipedia article writing which processes news articles from a collection of sources and determines which people are notable enough to have a Wikipedia biography (Vincent, 2018). By using the number of times that an individual is mentioned in a news article and in which context, Primer generates a score to quantify the individual's online coverage. We refer to this online presence score as the “Primer Index.”
To confirm the validity of the “Primer Index,” we also created a “Google Index” to approximate the total number of hits that appear when an academic's full name and occupation are searched on Google. Using a custom Google Sheets code, we extracted an academic's full name and occupation from Wikidata and automatically searched Google for every instance of “full name + occupation” for each academic in our dataset on the same day. In this way, we created consistent and accurate search terms to feed the search engine so that our results were more likely to be specific to the individual in the dataset. The Google Index reliably reflected our Primer Index results, meaning that individuals who have high Primer Indices also have high Google Indices. While we recognize that a subject's ability to garner a suitable online score is layered within systems of inequality, this method is meant to recapitulate the act of performing the WP:Search Engine Test. If individuals are being “searched” before nomination, per Wikipedia guidelines, then the Primer Index should reasonably predict if the individual is deleted from Wikipedia. In other words, if no bias exists, individuals with an active Wikipedia page will have Primer Indices much higher than those who were deleted. Our code is available on GitHub upon reasonable request. We will also consider sharing data with other scholars looking at racial or gender inequality online.
Content analysis
To explore a second element of notability, we searched all deleted pages (n = 377) for the presence of the Wikipedia shortcut WP:Too soon (n = 61). “Too Soon” is a technical label developed by Wikipedians indicating that a subject lacks sufficient coverage in independent, high-quality news sources to have a page. We conducted a thematic content analysis (Altheide, 2000) of articles labeled WP:Too soon to further understand how WP:Too soon is being applied and if its application is consistent with Wikipedia guidelines.
To determine accuracy in our comparisons, we also collected the career stages of each individual designated WP:Too soon. Wikidata does not include the career stage, so we assigned two research assistants to find and document the career stage of each individual labeled WP:Too soon between the years of 2017 and 2020. Since academic's jobs and career levels fluctuate often, the career stage was determined based on the creation date of the AfD page. Since perceived notability among academics is highly contingent on their rank (Adams et al., 2019; WP:Notability (academics)), academic careers were scored based on stage (e.g., assistant = 1; associate = 2; etc.).
To recapitulate Wikipedia guidelines and better determine if notability criteria were contingent on the career stage, we relied on Wikipedia criteria for academic notability. Since Wikipedia does not count trainees, research scientists, and/or government workers as “academics” those biographies of subjects who work in these spaces received a score of 0 (WP:Notability (academics)). To be clear, the authors do not agree with the Wikipedia guidelines surrounding what defines a “notable academic.” We believe that academic success comes in many forms and is not dependent on holding the position of “professor.” Nonetheless, scoring non-academics as a 0 allows us to determine the average career stage of the subjects using Wikipedia's criteria by calculating the sum of the career stage scores and dividing this by the total number of entries.
To quantify our content analysis, we scored the context in which WP:Too soon was applied in the AfD discussion for the pages that contained the WP:Too soon tag (n = 61). Discussions that relied on the Wikipedia definition of WP:Too soon (i.e., in the context of lack of sources and citations) received a score of zero, discussions that used WP:Too soon outside of the community established definition (i.e., in the context of career stage) received a score of one, and discussion that used both assessments received a score of two. Given the small sample size, we then performed a Fisher's exact test to determine if there was a significant difference across pages for men and women in the likelihood of using WP:Too soon in the context of the career stage.
Statistics
All statistical testing and graphing were performed using GraphPad Prism version 9.0.0 for macOS, a graphing and statistics software developed for use in biomedical statistics. 1 The following normality and lognormality tests were performed and we found that both the Google Index and Primer Index datasets were non-normally distributed: Anderson-Darling test, D'Agostino-Pearson test, Shapiro-Wilk test, and Kolmogorov test. The Kruskal-Wallis nonparametric test was then performed on the Google and Primer datasets with a post-hoc Dunn's multiple comparisons. Fisher's exact tests and chi-squared tests were run in cases of binary categorical comparison. Significance was determined by p < .05. We chose to represent our data as median plus 95% confidence interval due to the nonparametric distribution of the data.
Our research was driven by a central question: are notability criteria being equitably applied across race and gender? If WP:Search Engine Test is being conducted for all subjects prior to a deletion determination, we would expect to see a meaningful difference in the online presence score of those who were kept and those who were deleted. If WP:Search Engine Test was being equitably applied, there would be no significant difference in these scores when accounting for a subject's race or gender. Likewise, if WP:Too soon is being equitably applied, the label should correlate with news coverage, not the subject's career stage, regardless of ascribed characteristics.
Findings
Online presence (WP:Search Engine Test)
If WP:Search Engine Test is being equitably applied, academics with “kept” articles should have a larger online presence score than academics with “deleted” articles, regardless of gender or race. We found that white men whose pages were “kept” had a significantly higher median Primer Index (Median = 12.00) than white men whose pages were deleted (Median = 8.00, p = .0093 using Kruskal-Wallis test with post-hoc Dunn's multiple comparisons) (Figure 2). However, this observation did not hold true for white women or for BIPOC academics on Wikipedia. There was no statistically significant difference in the median Primer Index between kept and deleted pages for white women or for BIPOC academics (Figure 2). This finding indicates that there is a meaningful difference in WP:Search Engine Test outcomes for kept versus deleted white males but that the WP:Search Engine Test is not an accurate predictor of Wikipedia persistence for female and BIPOC academics. Moreover, our data suggest that the process might be more random for white women and BIPOC academics such that not all subjects are being “searched” before a deletion decision is rendered and that even women and BIPOC scholars with high online presence scores might be deleted.

A bar chart comparing median Primer Indices across race and gender, showing the median as box height with error bars representing the 95% confidence interval.
Given the discrepancy in online presence predictability, our data suggest that editors are not engaging with established mechanisms to assess notability and that other factors are being used to determine notability criteria for female and BIPOC academics. Not relying on WP:Search Engine Test, might contribute to the “miscategorization” of notable academics (Tripodi, 2021). According to Tripodi (2021), a “miscategorized” biography is one that was nominated for deletion despite meeting Wikipedia's standards for inclusion. Analyzing miscategorizations provides social scientists with a way to study how women's accomplishments are perceived by Wikipedia administrators (Tripodi, 2021). To better understand the criteria being used to decide whether an article is kept or deleted, we performed a miscategorization analysis of our own data and conducted a thematic content analysis of the comments associated with the nomination.
Consistent with Tripodi's (2021) earlier work, we found that the pages of female academics, regardless of race, are more likely to be kept after nomination for deletion compared to pages for male academics (X2 (1, N = 843) = 7.0, p = .0081). While a “keep” might seem like a positive outcome, it adds to the invisible labor necessary to close the “gender gap”—an emotional and time commitment typically assumed by women editors (Menking and Erickson, 2015; Tripodi, 2021).
However, when we controlled for gender to look at patterns of racial miscategorization, the pattern could not be extended more broadly. In fact, pages for white academics were more likely to be kept after nomination for deletion than pages for BIPOC academics (X2 (1, N = 843) = 24.19, p < .0001). Furthermore, when we compared these results to the online presence score, our data indicate that BIPOC biographies who meet Wikipedia's criteria (i.e., above the White Male Keep median Primer Index of 12.00) were among those deleted (Figure 2). For example, Tonya Foster, a professor of creative writing and Black feminist scholar at San Francisco State University had a high Primer Index of 41 yet her Wikipedia page was deleted. Another example is the late Sudha Shenoy, an economist and professor of economic history at the University of Newcastle, Australia, who had a high Primer Index of 198 yet her page was also deleted. These findings indicate that BIPOC scholars who meet inclusion thresholds might still be erased. Given these results, we sought to better understand the rationale behind the decisions to delete a biography. To do so, we conducted a thematic content analysis of deleted articles and their associated AfD discussions.
“Too Soon” for a Wikipedia page
When analyzing the AfD discussion pages of deleted academics, we noticed that a substantial number of pages nominated for deletion were tagged as WP:Too soon. According to Wikipedia guidelines, some topics need more time (i.e., the biography or page topic was submitted “too soon”) to obtain substantial coverage from reliable, independent sources before they can satisfy Wikipedia notability standards.
To systematically study the application of WP:Too soon, we searched all deleted pages in our dataset for the presence of WP:Too soon in the AfD discussion. We found that gender was the most salient commonality. Female academics were significantly more likely to have the tag WP:Too soon in their AfD discussion compared to male academics (X2 (1, N = 843) = 34.50, p < .0001). Of all deleted women's pages, 35% had the presence of WP:Too soon, while WP:Too soon was only present in 9.6% of deleted men's pages. These data are the first to suggest a disproportionate and gendered application of the tag WP:Too soon among pages of deleted academics on Wikipedia. Therefore, we performed a content analysis of the context in which WP:Too soon was applied, and documented subtle differences between male and female subjects.
First, let us illustrate the usage of WP:Too soon per Wikipedia guidelines. The following excerpt is from the AfD for the biography of a white, male, assistant professor who was nominated for deletion under the tag WP:Too soon: “Most of the newspaper articles cited in the main article are not directly related to the subject, and apart from this brief article in the Dainik Jagran that borders on being a hagiography of the subject, there's no real coverage for WP:GNG. WP:Too soon perhaps.”
As this moderator noted, the subject had inappropriate articles cited and inadequate coverage to support notability, even after a thorough online search. Despite the academic being an assistant professor, the moderator focused on media coverage, not the career stage, of the subject which is in accordance with the Wikipedia guidelines of the tag WP:Too soon.
However, we noticed that women's pages more frequently had a WP:Too soon label and that the use of the designation was often in reference to their career status, a rationale outside Wikipedia's guidelines for the tag WP:Too soon. These pages were subsequently deleted because the individual was too early in her career to be featured on Wikipedia. These examples from AfD discussions all failed to mention the presence or depth of media coverage. Delete per WP:PROF and WP:Too soon. She has respectable citation counts for a postdoc, but postdocs (and assistant professors and the UK/Irish equivalents) are usually too early in their career to have attracted enough attention to their works for academic notability, and [X] does not appear to be an exception to this general rule.
Delete as far WP:Too soon. Assistant professors are usually not notable and this is no exception.
I agree it looks to be WP:Too soon. If there are articles on male scientists of a similar early career stage, they should be nominated for deletion. The creating editor seems to misunderstand the level of notability required for academics.
Our dataset revealed that men at similar early career stages were present on Wikipedia. For example, Colin G. DeYoung, an assistant professor of psychology at the University of Minnesota had his page kept after a nomination for deletion. There was no mention of WP:Too soon in the AfD discussion and it only contained three responses, all of which voted “Keep” on the basis of citation count. Further, the work of Adams et al. (2019) documents that both assistant and associate sociology professors are regularly represented on Wikipedia.
Overall, men's pages were significantly less likely to be tagged as WP:Too soon. When applied, WP:Too soon's use often aligned with Wikipedia guidelines, referencing how the subject had a limited online presence, or moderators discussed WP:Too soon in reference to a “conflict of interest” making it difficult to assess notability. In other words, some men were either writing pages about themselves or using paid writers, which qualified the pages for deletion per Wikipedia guidelines. delete promotional as hell, as written. Appears to be WP:Too soon.
And the supposed references for the article are merely more things the subject has written, not anything that can be used as a reference about the subject. Far too soon.
Delete as WP:Too soon. Very low GS citations for a highly cited field. BLP bears the hallmarks of promotionalism. A WP:PAID contributor, [X], made a bunch of WP:ER's which have sat in Category:Requested edits since 6 May.
Delete Too low citation count, and no obvious reasons to pass WP:NPROF, perhaps WP:Too soon, but really delete because WP:NOTCV., Fails WP:GNG. All the references are from www.cprindia.org (where the subject apparently works). Include two references to Livemint and Financial Chronicle, apparently written by the subject.
To quantify these descriptive findings, and to understand if there was truly a significant difference in the context in which WP:Too soon was used in male versus female pages, we went back to the AfD discussion page for each individual in the dataset with a WP:Too soon tag (n = 61) and scored the context in which WP:Too soon was used. Overall, our data indicate that WP:Too soon is disproportionately applied to biographies about women. As opposed to discussing sources and citations, administrators more often reference a woman's career stage as the basis for using the tag. The frequency in which WP:Too soon was used in the context of career stage was significantly higher in AfD discussions for females compared to males (Two-sided Fisher's Exact test, p = .0204).
To further test this correlation, we performed a quantitative analysis to determine how career stage differs between men and women labeled with WP:Too soon. We found that even though women were significantly more likely to receive a WP:Too soon label in the context of their career stage, there was no significant difference in average career stage between men and women in the dataset (p = .6391). The mean career stage for females with a WP:Too soon label was 1.21 while the mean career stage for males with WP:Too soon was 1.07.
We also compared the online coverage scores across men and women with WP:Too soon labels and found that there was no significant difference in median online presence scores between men and women (p = .9268). Within our dataset, women flagged with WP:Too soon did not have significantly lower career stages nor did they have lower online presence scores compared to their male counterparts. This finding reinforced the fact that WP:Too soon labels applied in the context of career stage for women are neither based on their career stage nor online coverage; rather they are likely influenced by other subjective factors such as implicit bias. This finding suggests that women are unfairly considered “too soon” for inclusion regardless of meeting Wikipedia's own thresholds.
When looking exclusively at racial differences, there was no significant difference in the use of WP:Too soon (X2 (1, N = 402) = 1.525, p = .2168). When we performed an independent analysis on the effects of gender by controlling for race, we found gender to be the most salient factor in the application of WP:Too soon. Among white academics, women were significantly more likely to have the presence of WP:Too soon in their deletion discussion (X2 (1, N = 244) = 19.75, p < .0001). Of white women's pages, 34.85% had the presence of WP:Too soon, while it was only observed on 10.67% of white men's pages. Among BIPOC academics, women were significantly more likely than men to have WP:Too soon in their deletion discussion (X2 (1, N = 138) = 10.18, p = .0014). Of BIPOC women's pages, 32.26% had the presence of WP:Too soon, whereas it was only observed in 9.35% of pages for BIPOC men.
Discussion
We have found that components of Wikipedia's notability criteria are not applied consistently across race and gender for biographies of academics. While an online presence score was an accurate predictor of whether a white male academic has their page kept on Wikipedia, our findings indicate that not all subjects are being adequately “searched” before they are nominated for deletion. Because online presence is not an accurate predictor, these findings indicate that white women and BIPOC academics face a greater uphill battle when it comes to getting their pages to stick because even those with high online presence scores are still just as likely to be perceived as nonnotable subjects.
When looking more deeply into the content of the AfD discussions, we also found discrepancies in the application of the tag WP:Too soon. Women's pages, regardless of race, were more likely to be tagged with the rationale for deletion. However, a content analysis of the AfD discussions indicates that the tag was being used in the context of a woman's career stage rather than her media and online coverage. This was not the case for men.
In short, our data demonstrate that the Wikipedia protocol created to guide deletion decisions (governed by WP:Basic) is differentially applied at several levels across race and gender. It is important to consider these findings in the context of the multiple barriers women and people of color face in receiving recognition for their achievements.
When it comes to having a page on Wikipedia, women and BIPOC academics face both primary and secondary barriers to inclusion. Primary barriers to inclusion are linked to systemic inequality in academia, making it harder for women and people of color to rise to positions of power that are canonically seen as notable in academia (AAUP, 2022; Hill et al., 2010). Not only are women and people of color less likely to move up in the academic hierarchy, but they are also less likely to receive media coverage of their achievements (Berger, 2020; Harrison, 2019). This lack of coverage is further exacerbated for historical female figures who had trouble receiving credit for their contributions to STEM, let alone public recognition of their achievements (Harrison, 2019). The same glass-ceiling effect can be extended to men of color who have been historically under-featured, even to the extent that they were excluded from American history (Rainone, 2020).
Our findings indicate that even when subjects hold a significant online presence or have overcome obstacles that often prevent participation in academia, their achievements do not count in the same way on Wikipedia as they do for their white, male, peers. In other words, even when white women or BIPOC academics meet these hard-to-achieve standards, it does not guarantee their Wikipedia page will stick (Hengel, 2017; Wagner et al., 2016). In 2002, sociologist Eszter Hargittai (2002) explained that the digital divide is about more than just access and that a “second-level digital divide” exists when it comes to skillsets. We argue that gender and racial inequality on Wikipedia can also be conceptualized as a “second-level” divide. The first level is related to broader systemic issues that Wikipedia cannot fix, discussed above. Compared to men, fewer women and people of color hold late-career status positions such as tenure, full-professorship, or distinguished chairs in academia (AAUP, 2022). This difference can be attributed to bias in hiring and promotion as well as an increased burden of caregiving placed on women (Gibson et al., 2020). However, our research illuminates second-level inequality that is distinctly tied to internal practices at Wikipedia. Our data document the ways in which subjective application of notability criteria ( WP:Search Engine Test and WP:Too soon ) benefits white male academics while serving no meaningful impact, or in some cases creating an additional hindrance, for women and BIPOC academics. This subjective application of online presence contributes to the process of miscategorization (Tripodi, 2021), adds more nuance to the way in which miscategorization plays out, and provides further examples of secondary barriers.
Networks: the invisible labor behind inclusion on Wikipedia
Our data indicate that race does not guarantee “saving” on Wikipedia to the same degree that has been documented in other studies (Tripodi, 2021). Based on our data, many BIPOC academics with high online presence scores were still deleted from Wikipedia during our study. Since BIPOC academics displayed an unexpected combination of unequally applied notability criteria plus lower “rescue” rates, or findings indicate that the mechanisms in place to ensure that notable BIPOC academics are not deleted from Wikipedia is not in place.
The striking finding that BIPOC pages have low rates of miscategorization but high rates for notable subjects being deleted, speaks to the role networks play in saving biographies nominated for deletion. Groups like Women in Red (WIR) or Art + Feminism organize Wikipedia Edit-a-Thons devoted to closing the gender gap. As part of this network structure, volunteers keep tabs on articles after they are created (WP:WikiProject Women in Red; Tripodi, 2021). This structure has improved representation. Organizations like Women in Red have increased the number of pages on English Wikipedia from about 15% in 2015, to just over 19% in 2022 (WP:WikiProject Women in Red). This loose connection of “weak ties”— editors who meet infrequently and would be regarded as acquaintances rather than close friends—is central to forming new biographies about women, improving existing biographies, and protecting newly created content. This network in place to create, protect, and save women's pages has created a variety of tools to help address and improve systemic gender bias on Wikipedia, including archive alerts and bots that detect when an article created by their network is nominated for deletion. An example of the “strength of weak ties,” described by Granovetter (1973), is also documented by Vitulli (2017) who reports on a system of editors “on standby” to help ensure the pages they created about women scientists persisted. Avid members of the WIR community, such as Wade, often involve themselves in AfD discussions surrounding the deletion of women's pages.
However, the need to create a robust network of volunteers to both generate and monitor pages to avoid miscategorization puts an undue burden on editors—often women—who already take on substantial emotional labor when engaging with Wikipedia (Menking and Erickson, 2015). As researchers have described, the AfD environment can become hostile and deter new editors from editing again—especially when their pages get deleted (Bear and Collier, 2016; Eckert and Steiner, 2013; Jemielniak, 2014; Menking et al., 2019; Peake, 2015). Weak ties are bound to lose their elasticity when editors fighting for biographies about notable women must deal with sexist remarks, learn how to navigate hostile environments, and/or dedicate more time and emotional energy to prevent miscategorized pages from being deleted (Menking et al., 2019; Tripodi, 2021).
Moreover, this process relies on such a network existing. We posit that part of the reason why pages of notable BIPOC academics are being deleted is that there is no robust and systematic advocacy group tracking when their pages are nominated for deletion (McDonough, 2021). At a broader scale, research, as well as advocacy, for BIPOC groups is severely lacking on Wikipedia. The Humaniki project, which identifies knowledge gaps on Wikipedia, does not have gaps in Race/Ethnicity as a central focus, yet it needs to be a priority (Humaniki: Wikimedia Diversity Dashboard Tool, n.d.). Without a vibrant network dedicated to saving these pages, BIPOC academics do not end up “miscategorized” but rather, systematically erased. While creating an initiative devoted to saving BIPOC pages that is as robust as WIR may help improve the sustainability of notable pages, the creation of such a network might inevitably put the burden on those already being deleted, contribute to editorial fatigue, and lead to editorial attrition.
At the same time, edit-a-thons do have positive impacts on the Wikipedia editing community and provide support for new editors (Evans et al., 2015; March and Dasgupta, 2020; Sengupta and Ackerly, 2022). The success of these events is, in part, due to their documentation practices. In the same way, events cover how to create a hyperlink or write from a neutral point-of-view, perhaps they could also leverage the importance of flagging pages to ensure their longevity. Since WP:Too soon is likely to be applied to more junior faculty, perhaps these events could center on those who are already in more senior roles to avoid setbacks for new editors. When a new member is unsure how to respond in an escalating AfD discussion, having other more experienced editors around, which is often the case during an edit-a-thon, can help. Unfortunately, dramatically changing bias and removing unwanted behaviors from Wikipedia would take a much larger shift in the composition of editors. While a revision of Wikipedia guidelines might help, our data indicate that well-intentioned guidelines (e.g., WP:Search Engine Test ) are not necessarily followed.
Limitations
While our project begins to address the lack of documentation and understanding of racial bias on Wikipedia, there are limitations to our methods and data. As we described earlier, the machine learning model was trained on imperfect datasets and low Wikipedia representation forced us to combine populations into one “BIPOC” group to have enough power to ask questions surrounding race within our dataset. Such a “solution” confounds race and ethnicity and risks quantifying race as a statistic without considering the everyday realities of racism (Zuberi, 2001).
As such, it's important to note that our need to combine biographies is representative of the lack of diversity in academia broadly. Black and Hispanic men and women only make up 2% of full-time academic professors in the United States (NCES, 2018). The limitation of only comparing white versus BIPOC impacts our ability to explore specific instances of racial biases and limits our ability to understand if we have diluted our effects on individual racial groups. Further, we cannot account for the specific role anti-Black racism might play in assessing inclusion on Wikipedia in a way that qualitative research could potentially address. Our dataset and analysis is also limited in that we could only look at how notability criteria is applied to cis-gender and binary biographies. More research is needed on how perceived notability impacts the LGBTQ+ community.
Conclusions
Overall, we have identified a useful analytic process to study inequality and described how criteria for notability are inequitably assessed on Wikipedia. Since the notability criteria themselves are difficult to change, and may not even be accurately applied, we need to look towards other approaches that could lead to a more objective and equitable application of the notability criteria across race and gender.
Wikipedia is situated in a society that has historically devalued the voices of women and people of color. Time and time again, diversity work is disproportionately completed by marginalized and underrepresented people, including on Wikipedia (Menking and Erickson, 2015; Porter et al., 2018). Currently, the weak-tie networks that save women's pages on Wikipedia are comprised of mostly women, but this does not have to be the case. Instead of continually placing the burden on underrepresented individuals to undertake emotional work, particularly when they already feel unwelcome on Wikipedia (Menking and Erickson, 2015), weak-tie networks could be comprised of established editors who are not underrepresented. Further, the creation of a Wikipedian identity, which has historically happened on Wikipedia itself (Bryant et al., 2005), could instead be forged on other online platforms such as Reddit, Facebook/Instagram, or Twitter. Creating alternative networks off Wikipedia to track flagged and/or deleted pages of marginalized groups might help foster collaboration and create a space where they could continue to contribute to equity-driven efforts on Wikipedia across platforms.
Working towards a more equitable and diverse Wikipedia goes beyond bettering the encyclopedia itself. The inclusion of marginalized voices on Wikipedia is critical to community innovation (Menking et al., 2019) and the availability of free online knowledge. The existence of diverse biographies on Wikipedia positively shapes the public perception of who can be a scientist, writer, artist, academic, etc. (Ezell, 2021; Salam, 2019). Diverse participation in academic fields increases productivity and problem solving (Gibbs, 2014). Addressing the bias on Wikipedia will also improve the training of artificial intelligence on Wikipedia data (Robitzski, 2017). New natural language processing systems are being trained on Wikipedia data to interpret and mimic human writing (Khurana, 2021), but if the data overtly express gender and racial biases, then these new systems will adopt these biases as well.
We hope that our work adds to the growing literature on gender and racial biases that exist on Wikipedia. While our research does not offer specific solutions, we argue that awareness of where these biases manifest on Wikipedia can influence Wikipedia community members who both write pages and make inclusion decisions.
Footnotes
Acknowledgments
We want to acknowledge Danielle Bassett for help applying the deep learning algorithm and Paul Lee for help adapting the code to our dataset so we could more meaningfully understand how race impacts perceived significance. We also thank The Director of Science at Primer, John Bohannon, who granted us access to the Primer code and Zein Tawil who worked with us ensure our precision of the tool. This resource helped create the proxy for online presence, a key datapoint in this paper. We appreciate the support given from the Center for Information Technology and Public Life at UNC-Chapel Hill and the extra set of eyes in the coding and proofing process provided by Aashka Dave and Yuyu Yang. Finally, we want to thank Jackson Burton, who gave important feedback throughout the entire research and writing process.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Correction (July 2023):
Since the original online publication, this article has been updated to correct a minor labeling error in Figure 1.
