Abstract
Gender is one of the most pervasive and insidious forms of inequality. For example, English-language Wikipedia contains more than 1.5 million biographies about notable writers, inventors, and academics, but less than 19% of these biographies are about women. To try and improve these statistics, activists host “edit-a-thons” to increase the visibility of notable women. While this strategy helps create several biographies previously inexistent, it fails to address a more inconspicuous form of gender exclusion. Drawing on ethnographic observations, interviews, and quantitative analysis of web-scraped metadata, this article demonstrates that biographies about women who meet Wikipedia’s criteria for inclusion are more frequently considered non-notable and nominated for deletion compared to men’s biographies. This disproportionate rate is another dimension of gender inequality previously unexplored by social scientists and provides broader insights into how women’s achievements are (under)valued.
Introduction
On March 7, 2014, a biography for Donna Strickland, the physicist who invented a technology used by all the high-powered lasers in the world, was created on Wikipedia. In less than six minutes, it was flagged for a “speedy deletion” and shortly thereafter erased from the site. 1 This decision is part of the reason Dr. Strickland did not have an active Wikipedia page when she was honored with the Nobel Prize in Physics four years later. Despite clear evidence of Dr. Strickland’s professional endeavors, some did not feel her scholastic contributions were notable enough to warrant a Wikipedia biography. My research demonstrates that the perceptions of Dr. Strickland’s accomplishments are not an anomaly. What happened to her biography fits a broader pattern regarding how women’s biographies that merit a Wikipedia page are disproportionally perceived as non-notable subjects.
Much scholastic work demonstrates the extent of gender inequality on Wikipedia. Women in all fields are underrepresented, articles about women’s interests are underdeveloped, and women are less likely to edit Wikipedia articles (Adams et al., 2019; Adams and Brückner, 2015; Ford and Wajcman, 2017; Hargittai and Shaw, 2015; Hill and Shaw, 2013; Jemielniak, 2014; Konieczny and Klein, 2018; Reagle and Rhue, 2011; Shaw and Hargittai, 2018; Torres, 2016; Tripodi, 2017; Wagner et al., 2015; Wagner et al., 2016). Researchers have also noted the hardships women face when editing Wikipedia, documenting the need to consider safety risks involved before editing certain topics or entering contentious spaces (Menking and Erickson, 2015; Menking et al., 2019; Press and Tripodi, 2021). However, this rich and extensive body of research does not consider whether gender inequality on Wikipedia is deeper than content underrepresentation or editorial constraints. Moreover, little work has focused on the connection between the documented “gender gap” (Eckert and Steiner, 2013) and how the notability of articles created in conjunction with this gap are evaluated after creation. This scholastic absence opens up the opportunity to consider the layered complexities of gender discrimination and expose nuanced practices typically hidden from view.
Through ethnographic observations of events designed to improve gender equity on English-language Wikipedia (“edit-a-thons”) and web-scraped metadata of biographies nominated for deletion on English-language Wikipedia from January 2017 to February 2020, this article documents how the interpretation and application of Wikipedia’s notability guidelines play a critical role in the perpetuation of gender inequality on the site. Specifically, my data indicate that biographies about women who meet Wikipedia’s criteria for inclusion are more likely to be considered non-notable than men’s. Because women’s biographies face additional hurdles to remain active pages, groups committed to closing the gender gap must bear that burden. These findings also shed light on how women’s contributions to society are contested in the twenty-first century and the extent to which a person’s gender affects their perceived significance.
Theoretical background
Gender trouble on Wikipedia
Wikipedia provides social and information scientists with an accessible way to study the persistence of gender inequality in the twenty-first century. Based on internal and external studies of English-language Wikipedia, women’s biographies are underrepresented and underdeveloped (Callahan and Herring, 2011; Lam et al., 2011; Wagner et al., 2016). Many notable women are missing from the site altogether (Adams et al., 2019; Luo et al., 2018; Reagle and Rhue, 2011). Regardless of the field of study, scientific achievement, or h-index, being male increases the chance of being recognized and featured on Wikipedia (Schellenkens et al., 2019).
Women’s pages are also more likely to feature language to indicate the person is a woman (Wagner et al., 2015). Overuse of gendered language (e.g., first female mayor, wife of) reaffirms a gender binary in a way that not only acts upon our preexisting ideas of men versus women but also shapes and forms the subject (Butler, 1990). Marking women’s pages with gendered language reifies a heteronormative hierarchy, creating a precedent that a notable person is presumed to be male unless otherwise stated. Moreover, women’s pages are less likely to link to other pages, whereas men’s pages are well connected throughout Wikipedia (Wagner et al., 2015). Gendered networking is particularly important because hyperlinks work like “magnets” attracting more editors to the page (Aaltonen and Seiler, 2015) and biographies about women have a better chance of surviving if they link to an existing Wikipedia page (Vitulli, 2017). In other words, when women’s biographies are not hyperlinked to other articles, they are less likely to improve over time, less likely to be read, and more likely to be deleted.
Not only are women’s pages underdeveloped (Adams and Brückner, 2015), but women are also less likely to edit Wikipedia. Studies have found that male editors make up an overwhelming majority of the community with estimates ranging between 70% and 80% (Meyer, 2013; Wikimedia Foundation, 2011). Researchers studying the persistence of gender inequality on Wikipedia have found women are reluctant to edit because the interface is not readily accessible and that the “pipeline” for participation is effectively broken (Hargittai and Shaw, 2015; Shaw and Hargittai, 2018).
In order to manage their personal safety, women editors often work in the “quiet corners” of Wikipedia, avoiding topics or areas prone to harassment (Menking et al., 2019; Press and Tripodi, 2021: 140). The need to create safe spaces and tread lightly in discussions are just some of the many reasons Wikipedia participation requires a “taxing level of emotional labor” for women editors (Menking and Erickson, 2015: 209). This hostile environment deters women from continued participation in the community (Bear and Collier, 2016; Eckert and Steiner, 2013; Fister, 2016; Jemielniak, 2014; Peake, 2015), especially because many women do not think Wikipedia does enough to deal with the problem of online sexism (Gauthier and Sawchuk, 2017; MacAulay and Visser, 2016; Menking, 2015; Paling, 2015). To be sure, Wikipedia’s ability to address these concerns is constrained, given the site’s volunteer structure and limited editorial oversight.
Despite this rich and extensive work on gender discrimination on Wikipedia, little scholastic attention has been paid to see how scrutiny over the notability of women subjects hinders editors committed to closing the documented gender gap on the site.
Deletionism and notability on Wikipedia
Determining whether content should be included on Wikipedia is fraught with contestation. These boundaries of inclusivity are drawn around myriad reasons, including the reputability of sources used (Luyt, 2012; Luyt and Tan, 2010), the political nature of the subject (Shi et al., 2019), and whether the article is written in a neutral tone or about a worthy enough subject (Gauthier and Sawchuk, 2017; Matei and Dobrescu, 2011). While deletionists believe that articles that cover obscure content or do not receive significant attention weaken the encyclopedic nature of the site, inclusionists favor a “long tail” approach to Wikipedia, given the nearly limitless space constraints of the Internet (Lam and Read, 2009). Research on deletions reveals that the most frequently used rationale for deleting an article was that it had “no indication of importance” (Geiger and Ford, 2011: 201; Lam and Read, 2009), and deletions due to a non-notability classification have increased over time (Lam and Read, 2009).
Wiki-notability means that the topic/subject has received significant coverage in reliable sources that are independent of the subject (WP: NOTE). Despite the presumption of consensus among Wikipedians, “neutral” roles and formalities on the site embody subjectivity and bias in their application and effect (Luyt, 2012). Studies show that women’s biographies are slightly more notable than men’s (Wagner et al., 2016), and the level of activity and traffic on Wikipedia articles dedicated to female scholars are not proportionate to their scientometric achievements (Samoilenko and Yasseri, 2014). Scholars have repeatedly voiced concern that wiki-notability is inconsistently enforced, arbitrarily assessed, and biased against women (Gauthier and Sawchuk, 2017: 391; Kramer, 2019; Vitulli, 2017).
To date, most scholarship on wiki-notability has focused on whether an article is deleted. Adams et al.’s (2019) analysis of approximately 6,323 threads found that women academics were not more likely to be deleted. However, they did not analyze if women subjects who met wiki-notability criteria were more likely to be targeted for deletion through the nomination process. As Crawford and Gillespie (2014) argue, flagging might appear to be a single data point, in this case “deletion” but these tags are often tangled up in a system’s design and users’ intentions. Understanding the layered processes within Articles for Deletion (AfD) and how it relates to gender bias is an important factor in understanding women’s (under)representation on Wikipedia. If women who meet Wikipedia’s threshold for inclusion are more likely to be nominated for deletion than men, it creates an additional hurdle for editors determined to close the gender gap on the site.
This study combines qualitative observations with statistical analysis on all biographies nominated for deletion over a three-year period (n = 22,174) and focuses explicitly on nominations determined to meet wiki-notability criteria – articles that received a “keep” decision. While a “keep” decision might seem positive, it also acts as a marker for miscategorization. “Keep” means the article meets wiki-notability and the criteria for inclusion on Wikipedia. Miscategorization is important because it widens the discussion around notability bias and provides social scientists a way to study how women’s accomplishments are perceived outside of an experimental setting. By focusing on kept biographies, I more thoroughly examine whether women are more likely to be considered non-notable than men.
Expanding upon Konieczny and Klein’s (2018) important work, which confirms that gender inequality can be analyzed and quantified on a large scale, I argue that evaluating the extent to which women’s biographies are miscategorized as non-notable should be included as another indicator when trying to measure worldwide differences in gender equality relevant to existing human development indices. Miscategorization also sheds light on another dimension of the emotional labor that editors endure when trying to close the Wikipedia gender gap (Menking and Erickson, 2015). Given the increasing role Wikipedia plays in shaping Google’s search returns (Lewandowski and Spree, 2011; McMahon et al., 2017) and teaching AI systems (Robitzski, 2017), discrediting the significance of women subjects holds wider implications than just Wikipedia representation and provides a broader understanding of how women’s accomplishments are undervalued.
Data and methods
This is a mixed methods study drawing on ethnographic observations from events designed to improve the representation of women on English-language Wikipedia and web-scraped metadata from “Articles for Deletion” (AfD).
This exploratory sequential design began with hundreds of hours of ethnographic observations at 15 edit-a-thons from 2016 to 2017. Edit-a-thons are daylong events designed to improve the representation of women on Wikipedia while also providing a safe space for new editors—primarily women—to learn how to contribute to Wikipedia (Lavin, 2016; Menking, 2015; Press and Tripodi, 2021; Sayej, 2018; Thomas, 2017). In addition to edit-a-thons, I also attended two large-scale Wikipedia events, smaller meetups, happy hours, and two regional chapter meetings. In-depth interviews with 33 individuals (23 Wikipedians and 10 new editors) 2 were conducted outside participant observation spaces. I coded my fieldnotes and transcriptions in two stages. First, I conducted an open coding (Charmaz, 2006), consisting of listening to recorded interviews while reviewing my fieldnotes and writing down emergent ideas on a series of notecards. Second, I arranged these cards in clusters, identifying which themes were the most salient. After flagging particularly salient “in vivo codes” (Charmaz, 2006), I conducted a more focused coding of my transcript data, determining the accuracy of the threads identified.
Through my ethnographic observations, I learned about AfD. AfD is a process where Wikipedians can examine articles under scrutiny, add to discussions about the merits of the article, and determine whether it should be kept, deleted, or merged/redirected to an already existing article. The AfD archive 3 is a searchable database of every Wikipedia article nominated for deletion. I enlisted the help of a computer scientist to write a script 4 to scrape AfD daily log pages either for a specific day, month, or for all the pages linked to the open index. The metadata were then filtered to look for tags or phrases to indicate that the entry in question was a biography. 5
The dataset created from this script consists of biographies nominated for deletion from January 2017 to February 2020. I focused explicitly on biographies because the edit-a-thons I was observing were organized around adding biographies about women to try and close the documented gender gap. After exporting to a comma-separated values (CSV) file, data were manually cleaned and coded in Excel for statistical analysis. I analyzed nominations by month for the entire year of 2017, 2018, and 2019 and the first two months of 2020 (totaling 22,174 biographical entries around a she or he gender binary). A subset of the articles focused on transgender and non-binary subjects. These were also analyzed by the researcher but were not included in the chi-square analysis. A condensed sample of the cleaned and coded dataset is available in Appendix A. Access to the GitHub code and a full copy of the raw dataset is available upon request.
Based on ethnographic observations and other published accounts of notability bias (Gauthier and Sawchuk, 2017; Kramer, 2019; Schellenkens et al., 2019 Vitulli, 2017), I tested the following hypotheses:
H1. The proportion of biographies about cis-gender women (she/her/hers) nominated for deletion each month will be greater than the proportion of available biographies about cis-gender women (she/her/hers) on Wikipedia during the same time period.
H2. Articles about cis-gender women (she/her/hers) are more likely to be misclassified as non-notable (i.e. “kept”) than articles about cis-gender men (he/him/his).
I used descriptive statistics to determine the percentage of nominations and misclassifications and relied on a chi-square analysis to test the proportional significance. If the process of nominating AfD is not biased by gender, then the percentage of articles about women nominated for deletion should account for roughly the same proportion of available biographies about women on English-language Wikipedia. If no gender bias exists, the percentage of miscategorized biographies should not vary by gender.
Findings
Edit-a-thons and perceived notability
Sitting in small groups of three around square tables, roughly 15 volunteers typed busily on their laptops as daylight waned through the large window that faced the street. These volunteers had gathered for a “Women in STEM” event, an edit-a-thon focused on writing notable female scientists into Wikipedia. Of those in attendance, only two were men and they were established Wikipedians looking to encourage newcomers to discover a passion for editing they found long ago. While most established Wikipedians were in their late 40s and early 50s, the new volunteer editors were in their early 20s. They were vibrant women, most of them also budding scientists, who were tired of seeing women they learned about in their coursework missing on Wikipedia.
At the front of the room was a whiteboard containing a list of 22 women: biologists, neuroscientists, anesthesiologists, botanists, and chemists who invented pharmaceuticals, surgical interventions, and life-saving materials that many of us rely on. Despite their contributions to their respective fields, none of them had a Wikipedia page. By partnering with a library, the edit-a-thon success was bolstered by its institutional resources. Writing women’s biographies can be an arduous process. As web sources are the most frequently used citations for establishing notability on Wikipedia (Huvila, 2010; Luyt and Tan, 2010), demonstrating a person’s significance can be difficult if they are not featured in electronic publications. By recruiting librarians dedicated to gender equality, archival materials about the women on the whiteboard were readily available. Next to a library cart of pulled books and articles at the front of the room, was a table with snacks and soda to keep the creative juices flowing.
Before new volunteers could start writing women into Wikipedia, they had to learn how to edit. Similar to Hargittai and Shaw’s (2015) and Eckert and Steiner’s (2013) findings, the new users I interviewed were drawn to edit-a-thons because they had read about the gender gap in the news, but were unsure how to engage. To help ease these tensions, established Wikipedians would start each event with a presentation, helping newcomers create a username and interact with the interface. These opening activities worked to create a “safe space” (Menking and Erickson, 2015) for newcomers who were nervous to edit Wikipedia.
Leah, a woman in her 20s with dark rimmed glasses and bright lipstick, described her trepidation: There’s a fear, whether the fear is I’m going to break it, or I’m going to not really know what I’m doing, or I’m going to feel out of my depth with this, or I’m going to feel overwhelmed . . . I’m not even sure where to start. It was mysterious and intimidating and I just didn’t know whether it was even appropriate for me to add information or what the standards were. I felt like I was breaking into someone else’s club.
Even during the edit-a-thon and with the help of Wikipedia mentors, the new editors expressed clear apprehension in editing, frequently noting that they were unlikely to have figured out how to do this on their own.
In interviews following the event, newcomers said that they enjoyed the process, but would not likely edit on their own because they still found the experience too frustrating. Most had attended the event in the hopes of adding hundreds of women. They were dismayed to learn that adding just part of an article had taken the entire day. Only one person I interviewed recalled their username/password just days following their participation in an edit-a-thon and none of the new editors had added the articles they created to their “watchlist”, a function that allows logged-in users to follow a page by clicking on the star icon in the upper right corner of an article.
Wikipedians who organized the event understood their frustrations and were concerned new recruits would not keep tabs on the articles they created during the edit-a-thons. To help ensure that content would “stick around,” Wikipedians would add articles they helped mentor to their own “watchlist.” When I asked Wikipedians why they felt the need to watch over new articles, I learned from Janet—an academic in her mid-50s that encourages classroom participation to improve women’s biographies—that it was common for “women being added at these events to be immediately flagged for deletion, or even sent to AfD where they would experience further sexism.”
The observation Janet noted was also confirmed by editors I interviewed who are affiliated with Women in Red (WiR). WiR is a group of editors committed to improving systemic bias on Wikipedia and closing the gender gap by focusing on creating content regarding women’s biographies, women’s work, and women’s issues. Their name derives from the practice of turning “red links” (pages that do not yet exist on Wikipedia) into blue (an active page). Since 2015, WiR has increased the percentage of women’s biographies to 18.93%, 6 but members are routinely aggravated by their efforts being undermined. During interviews, multiple WiR editors explained that they must “double-back” on their efforts because articles WiR add are constantly being flagged as non-notable and nominated for deletion.
To try and prevent women’s biographies from being deleted, Wikipedians devoted to closing the gender gap set up systems to protect newly created content. This same strategy was documented by Vitulli (2017: 7), who credited the seasoned editors she had “on standby” as the reason why notable female mathematicians she wrote into Wikipedia were able to “survive” on the site. This strategy for watching articles created during edit-a-thons might also explain why articles with a higher percentage of women editors are more likely to be protected (Lam et al., 2011). Akin to the findings of Niederer and Van Dijck (2010) who argue that Wikipedia is an intricate collaboration between human users and automated contributions, WiR evolved their sociotechnical system to implement the assistance of non-human contributors. By creating a bot, WiR are immediately notified when an article created by their network is nominated for deletion.
During an interview with a new editor who was studying to be a fashion designer, they suggested I look at the revision history of Lois K. Alexander Lane—a woman who played an integral role in memorializing the historic contributions of African American fashion designers. Lane founded two museums (the Harlem Institute of Fashion and the Black Fashion Museum), wrote a book, ran two boutiques, and designed her own clothes. Not only did her museums memorialize the contributions of prominent Black stylists (including the work of Ann Lowe), Lane used the spaces to give free courses in writing, English, mathematics, and African-American history. When she died in 2007, The Washington Post ran an obituary detailing her accomplishments and credited Lane as a prominent figure in the history of fashion (Bernstein, 2007). Her fashion archive is now on display at the Smithsonian National Museum of African American History and Culture.
Despite her professional accomplishments, Lane did not have a Wikipedia page until eight years after her death. Through data matching, I found that Lane’s biography was created during an edit-a-thon designed to increase coverage of African-American women on Wikipedia. According to edit history, her biography was pushed out of the main space by a Wikipedian who deemed Lane “a person not yet shown to meet notability guidelines.” 7 Analyzing the state of the original article through the page’s revision history, it is clear the preliminary entry included basic biographical and professional information as well as links to seven credible sources independent of the subject, including The Washington Post and the Smithsonian. Editors can evaluate wiki-notability using what Wikipedians refer to as the “Search engine test” (WP: GTEST). As an act of good faith, editors should search for the topic and attempt to find reliable sources before deciding on whether an article is notable enough for inclusion. Yet most of the information I learned about Lane was through a simple Google search of her name.
During interviews with established Wikipedians outside of edit-a-thons, frustrations regarding misclassification were palpable. Not only were they volunteering their weekends to organize and attend events designed to improve the coverage of women on Wikipedia, but they also had to devote a substantial amount of time to make sure the article would survive. Margaret, a passionate female editor in her 20s who regularly organizes edit-a-thons for improving coverage about women, described in an interview how an article she created about a feminist activist was categorized as “non-notable” and pushed into AfD only a few hours after she published the biography.
Her frustrations were echoed in a separate interview with another editor named Brenna who volunteers much of her free time to adding women onto Wikipedia and volunteers at edit-a-thons to help new editors learn the ropes of editing. Like the scientist Dr. Strickland, or Ms. Lane the fashion designer, a biography of a woman who pioneered the radio was also nominated for deletion after Brenna tried to create her page.
I came across her work in some really great archived newspapers and I made a Wikipedia page about her. Within a couple of hours, it was flagged for deletion because they, on the talk page, were like: “Um she’s not a notable figure. Why is she important? I don’t think this is worthy of Wikipedia.” . . . I had to dig deeper and find even more archival newspapers that made her indisputably notable . . . But it just sucked because, you know it’s kind of like, I feel like this is a recurring story . . . you have to work twice as hard to prove that the content is valuable and is worthy of being in.
Extra hurdles to establish notability add to the emotional labor women face when editing Wikipedia (Menking and Erickson, 2015). Having an article about a notable woman nominated for deletion is not only “annoying” but also “intrusive and degrading” (Kramer, 2019). Miscategorization means editors devoted to closing the gender gap must volunteer even more time to improving Wikipedia. They not only have to create the pages, but they need to monitor the new pages to be sure they are not immediately considered non-notable and erased.
Wikipedians I interviewed reported that when they broach discriminatory concerns over notable women’s biographies being unfairly targeted for deletion, they were often told they were being “too sensitive” over content they had created. Thus, editors committed to writing articles about notable women frequently hid their emotions or frustrations as part of the “deep emotional work” necessary when confronting editors who rebuked their discriminatory concerns as “matters of clashing personality” (Hochschild, 1989: xxi; Menking and Erickson, 2015). Not unlike women in the workplace who are called over-sensitive for resenting sexual harassment (MacKinnon, 1979), Wikipedians who organized edit-a-thons explained that when they voiced concerns over notable women being nominated for deletion, they were told that they were taking the editorial process “too personally.”
When I asked Wikipedians I interviewed to show me articles they were referring to, it became clear that some articles about notable women were nominated for deletion while the edit-a-thon was still happening. As the subject was the focus of an edit-a-thon, their status in their respective field (e.g. STEM or art) should have been easily recognizable and because these events partnered with libraries, these biographies had the requisite number of sources to establish their wiki-notability. Why were biographies that met Wikipedia’s own notability guidelines being nominated for deletion? Was miscategorization happening to notable men too, or were women more likely to be considered non-notable? If women are presumed less notable, it creates another layer of time and energy needed to improve gender representation on the site. This obstacle also presents barriers to edit-a-thon success. Neglecting to follow a biography’s persistence means that women who have already established their credibility in a patriarchal system of accreditation might also “vanish” from our cultural memory (Luo et al., 2018).
Analyzing AfD metadata
To test the theory that biographies about women who meet Wikipedia’s threshold of inclusion are more frequently miscategorized as non-notable, I collected, categorized, and analyzed 38 months of AfD data (22,174 he or she biographies). I sought to compare if the overall percentage of biographies about women nominated for deletion each month was proportionate to the available biographies about women. If the nomination process was not being biased by gender, the proportions between these datasets should be roughly the same.
My dataset revealed that the proportion of women nominated for deletion each month (out of all biographies nominated for deletion) was greater than the proportion of available biographies about women on English-language Wikipedia more generally. From January 2017 to February 2020, the number of biographies about women on English-language Wikipedia rose from 16.83% to 18.25%, 8 yet the percentage of biographies about women nominated for deletion each month was consistently over 25%. Some months, it was much higher. For example, 41% of the biographies nominated for deletion in April 2017 were about women, but only 16.93% of available biographies on Wikipedia were about women on April 30, 2017 (see Figure 1). Even though women still make up less than 19% of all available biographies on English-language Wikipedia, women routinely make up a quarter of the biographies nominated for deletion each month.

A side-by-side comparison of the portion of available biographies about women on Wikipedia versus the portion of women biographies nominated for deletion from January 2017 to February 2020.
Disproportionate nomination is intimately connected to the underrepresentation of women on English-language Wikipedia. Even if just a few notable women are mistakenly deleted, it poses significant hurdles for closing the gender gap. These struggles are documented in the revision history of the WiR main page. For example, on February 19, 2018, the percentage of biographies about women had risen to 17.90%, but only a few weeks later (February 26, 2018) many of the biographies WiR added had been deleted from Wikidata, dropping that statistic back down to 17.53%. It took editors devoted to closing the gender gap seventeen months to get the percentage of women biographies back to 17.90% (they reached 17.91% on July 30, 2019).
I also wanted to determine whether women were more likely to be miscategorized as non-notable despite meeting wiki-notability guidelines. If no gender bias exists, the percentage of miscategorized women would be equal to the percentage of miscategorized men. If cultural beliefs value one gender as more worthy than another (Ridgeway, 2011), then the percentage of women categorized as “non-notable” but subsequently “kept” for inclusion would be higher than the percentage of men categorized as non-notable but kept.
My data indicate that women’s biographies are more frequently miscategorized as non-notable than men’s (see Figure 2). On average, 19% of all biographies nominated for deletion are kept from January 2017 to February 2020, but roughly 25% of women’s biographies are miscategorized, whereas only 17% of men are miscategorized. This difference was statistically significant. χ2 (1, n = 22,174) = chi-square statistic value, p < .000.

A chart comparing the percentage of miscategorized biographies about women versus the percentage of miscategorized biographies about men.
In January 2017, June 2017, July 2017, and April 2018, women’s biographies were twice as likely as men’s biographies to be miscategorized as non-notable (p < .02 for each month). The statistical significance and the real significance of the observed difference of these findings strongly support the patterns identified during my ethnographic observations. Wikipedians trying to close the gender gap must work nearly twice as hard to prove women’s notability, devoting extra time to track the biographies they create to ensure notable biographies about women are not subsequently deleted.
Only once (June 2018) were notable men more frequently miscategorized, but this was not statistically significant (p > .15). Three times over the three-year period my data could not reject the null hypothesis. The proportion of miscategorized biographies was equal between men and women in October 2018, November 2018, and May 2019. However, these proportions were not statistically significant (p > .85). Despite its statistical insignificance, one might consider how a temporary shift toward equitable assessment of women’s notability may be correlated with the international coverage surrounding the miscategorization of Donna Strickland. It was shortly after Dr. Strickland won the Nobel Prize in Physics on October 2, 2018 that news outlets began covering the problem of notable women being targeted for deletion on Wikipedia. Unfortunately, the “Donna Strickland effect” was short-lived, and notable women were once again more likely to be miscategorized for the next six months.
As my qualitative and quantitative data demonstrate, the problem of underrepresentation on Wikipedia runs deeper than simply missing pages. Not only are Wikipedia’s notability criteria a barrier for women (Adams et al., 2019), even women who meet these stringent guidelines for inclusion are still more likely than men to be considered “non-notable” and nominated for deletion. Of course, wiki-notability is not static, and discussions surrounding a subject’s perceived significance are not inherently good or bad. Debate surrounding notability could be productive, as is the case when editors with differing political backgrounds debate the merits of an article (Shi et al., 2019). However, my findings indicate that noteworthy women are generally seen as less notable, especially since their purported significance is easily verifiable using the search engine test (see earlier Lane example). This finding, as well as the data demonstrating that women make up a greater portion of AfD than they do available biographies on Wikipedia, suggests that a subject’s gender, identifiable through pronouns or forename, is being used to make snap judgments regarding perceived relevance. Such a finding indicates that gender discrimination is multi-layered, revealing the extra hurdles women face when trying to establish cultural significance.
Some could interpret my data to conclude that more keeps mean the mechanisms of Wikipedia are working. While it is true that a “keep” decision means the biography was saved from deletion, it also means that editors trying to close the gender gap must be on standby if they want a biography they create to persist (Vitulli, 2017). Moreover, even content added and protected by established networks like WiR can still be deleted. As WiR archive alerts indicate, a significant number of notable women have already turned back to red, meaning their page no longer exists on Wikipedia (WikiProject Women in Red/Article Alerts, 2020). This means that even when systems designed to ensure notable women are “kept” are put in place, they are bound to fail if women are disproportionally considered non-notable to begin with.
Ensuring that biographies about notable women are kept also adds to the emotional labor and time documented by Menking and Erickson (2015). Volunteers already devote hundreds of hours organizing events, identifying notable women, and pulling sources that demonstrate their notability. My data indicate they must also devote time to monitoring new pages in case they are mistakenly identified as non-notable subjects. If an editor wants to cast a keep vote for a miscategorized article, it also means participating in AfD—what respondents in this study referred to as one of the “most male” and “most sexist” spaces on Wikipedia.
This means that when articles about notable women are nominated for deletion, it forces editors (many of whom are also women) to participate in interactions where they may not feel safe. As Menking et. al. (2019: 9) note, highly involved Wikipedia editors avoid AfD discussions, which can become aggressive. Many of the editors I interviewed feel so uncomfortable in AfD, they do not participate in the discussions of biographies, even if they are invested in their survival. As one WiR editor described, “I don’t post a reply on AfD because that would be superfluous. I just try to improve the article as best I can and hope for the best.” If the survival rate of notable female subjects is dependent upon editors placing themselves in situations where they do not feel safe, and volunteering extra time to an already lengthy process, then the system in place is not sustainable.
This additional work may also contribute to more women editors quitting Wikipedia over time (Menking et al., 2019). Some Wikipedians try to avoid the AfD “drama” all together by writing about obscure figures. As a prominent figure in WiR explained, she avoids writing about upcoming leaders in math and science like Clarice Phelps (see Kramer, 2019) and tends to just write about dead women who, she believes, will be less likely to “get noticed.”
Non-binary notability
Reflected in my quantitative data are also preliminary findings regarding articles about transgender subjects and individuals who identify as non-binary. Like biographies about cis-gender women, LBGTQ + biographies are also more likely to use language that indicates their LGBTQ+ identity, such as “transgender actress” and are frequently classified as non-notable despite meeting Wikipedia’s criteria for inclusion. For example, in March 2018, both the biography of kimura byol-nathalie lemoine (a non-binary, Korean-born, Belgium-raised activist, feminist, and artist) and the biography of Dominique Jackson (a Tobagonian-American transgender actress who starred in the critically acclaimed television series Pose) were nominated for deletion despite their established notability. In May 2018, Akkai Padmashali (a transgender activist who has received the second highest civilian honor of the state of Karnataka) was nominated for deletion. Fortunately, all articles were kept. Padmashali received a “speedy keep” because the administrator who closed the nomination noted a “clear abuse of process and disruption.”
My data also suggest that some might be repeatedly targeting women/LGBTQ + biographies as non-notable and nominating them for deletion. After all, research demonstrates that the deletion process is heavily frequented by a relatively small number of long-standing users (Geiger and Ford, 2011). In some cases, users are not taking the time to nominate in good faith or conducting the search engine test. Several pages of notable women I observed were nominated for deletion within a few hours of creation and sometimes during the edit-a-thon while the article was being created. One month, multiple biographies about women were nominated with the same phrase copy and pasted over and over: “I don’t see how she manages to pass our notability guidelines.”
Discussion and conclusions
In this article, I have documented another major hurdle when it comes to closing the “gender gap” on Wikipedia. In addition to concerted efforts by editors to create new pages on women subjects, attention must also be paid to perceived notability. My findings indicate women subjects are more likely to be considered non-notable even if they meet Wikipedia’s criteria for inclusion. As many of these biographies are created by organizations like WiR who actively seek out notable women to begin with, frequent miscategorization means more setbacks. Rather than spending time creating new pages, experienced editors must set alerts on the ones they have already created to ensure they do not get erased.
These findings enhance and expand upon reputable research demonstrating how Wikipedia’s evaluative mechanisms for inclusion are unequivocally connected to gender bias (Gauthier and Sawchuk, 2017; Kramer, 2019; Schellenkens et al., 2019; Vitulli, 2017). However, my findings also replicate the results of experimental studies conducted over 60 years ago in which researchers demonstrated that men are routinely considered worthier and more valuable than women (Berger et al., 1980; (Broverman et al., 1972; Eagly and Wood, 1982). By studying Wikipedia data, sociologists can analyze how women’s notability is assessed over time without experimental constraints.
Unfortunately, the gender inequality observed in this and other studies is difficult to change and indicative of a larger structural problem. As Wikipedia is a semi-anarchic volunteer project, little editorial oversight exists. While the Wikimedia Foundation, a non-profit committed to free and open-sourced information projects, hosts Wikipedia, editors and admins are not beholden to the organization’s recommendations. Suggestions from Wikimedia can be disregarded or repealed (e.g., when Wikimedia tried to roll out a more intuitive and user-friendly editing interface).
Given that organizational influence is limited, one way we might explain (and improve) the gender discrepancies I observed in this article is to dig deeper into the interactive dynamics of Wikipedia’s editorial hierarchy. Gender is more effectively salient and more likely to implicitly shape behavior when interlocutors’ gender differs (Ridgeway and Correll, 2004; Ridgeway and Smith- Lovin, 1999). It could be that women occupy a lower status position within Wikipedia more broadly, they might be newly registered editors and less likely to hold administrative roles. This structural inequality might contribute to notable women’s biographies being perceived as trivial. Another explanation is that this phenomenon is not exclusive to Wikipedia, and that women are considered less-notable members of society more generally. Future studies could explore the extent to which the evaluative mechanisms highlighted in this study transfer to other environments.
In addition to LBGTQ + bias, preliminary analysis of articles nominated for deletion indicates patterns of racism and anti-Semitism. For example, soccer players from Gambia who are not white passing were routinely nominated as non-notable despite extensive athletic coverage in international newspapers. Given these patterns, further research is necessary in order to determine the extent to which intersectional patterns of oppression are mapped onto how a subject’s notability is assessed. Using the same dataset, future work could strengthen and expand on intersectional theory to test how race and ethnicity are factored into assessing the notability of human subjects.
My findings also indicate the need for a more robust discussion on the limitations of pronoun use. As gender clearly shapes the perceived notability of subjects (either explicitly or implicitly), we must consider the way in which pronouns amplify one’s gender. Pronoun dichotomies (him or her) not only maintain heteronormative standards but also act as an easy signal for those evaluating one’s contributions. While I recognize the importance of highlighting women’s accomplishments throughout history, tackling the complexity of pronoun use would be a valuable contribution that other social scientists might wish to consider.
Finally, my data indicate that more research is needed regarding the notability of articles deleted from Wikipedia. Examples like Donna Strickland and Louis K. Alexander Lane further complicate the notion that women are “vanishing” from our historical memory (Luo et al., 2018). Future research must test the extent to which notable women are not just disappearing into thin air, but rather, are actively being erased.
Footnotes
Appendix
Subset of data to demonstrate how data were cleaned up and coded—URLs active. Script for pulling data and full *raw* dataset available upon request.
Acknowledgements
The author would like to thank and credit Eric Rochester for his time and innovation in data science.
Declaration of conflicting interests
The author has no conflicting interests.
Funding
The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The metadata procured for this project was created as part of a Dissertation Fellowship funded by the Scholars’ Lab at the University of Virginia.
Statement of agreement
The sole author agrees to submission. This article is not currently being considered for publication by any other journal.
