Abstract
Digital platforms produce bias and inequality that have a significant impact on peoples’ sense of self, agency and life chances. Wikipedia has largely evaded the criticism of other algorithmic systems like Google search and training databases like ImageNet, but Wikipedia is a critical source of representation in our current era – not only because it is one of the world's most popular websites, but because its data are being used as training data for the AI systems that are increasingly used for decision-making. We conducted an analysis of Wikipedia biographies in a national context, comparing the temporality and subjects of notability between English Wikipedia and the Australian Honours system in order to understand Wikipedia's unique role in the production of notability over the site's 20-year history. Framing Wikipedia as an active producer (rather than a reflection) of notability, we demonstrate that women are more likely to be awarded a Wikipedia page after the award announcements or not at all if their contribution is for labour relating to the caring professions than if their service is for sports, arts and films, politics or the judiciary. We argue that Wikipedia's inability to recognise gendered care work as noteworthy is mirrored in its own practices.
Introduction
Data and artificial intelligence systems are rapidly being integrated into every aspect of daily life. Data systems not only enable us to do things (banking, dieting, learning, driving and dating, for example) more efficiently, they also come to stand in for us – to represent us in ways that are increasingly accepted as more accurate and truthful than other sources (Beer, 2017: 1; Munn et al., 2023). Like all systems of representation, data and AI systems reflect biases that can have serious effects on our ability to live productive lives that are free from inequality and prejudice. As data and AI systems continue to deliver us into an age of deep mediatisation (Hepp, 2020), digital media and their infrastructures have become constitutive of the social worlds in which we live.
Research on representation and inequality in technological systems has highlighted the existence of systemic biases in various technologies such as search engines (Noble, 2018), municipal planning tools (Safransky, 2020) and facial recognition tools (Introna and Wood, 2004; Buolamwini and Gebru, 2018) that contribute to widening social and economic inequalities in an era of historic imbalances. This work, in turn, stems from a long history of research that evaluates the relationship between women, technology and knowledge production (Cockburn, 1988; Wajcman, 2004). Noble (2018), for example, demonstrates how her searches for ‘Black girls’ on Google resulted in links to pornographic websites and sexualised images and concluded that such representations can impact the self-esteem, identity formation and mental well-being of Black girls and women as well as influence how others view women of colour. In ‘Weapons of Math Destruction’ (2016), O’Neil examines how algorithms can disproportionately impact marginalised communities, resulting in individuals being unfairly denied opportunities such as jobs, loans or educational opportunities based on factors like race, gender or socioeconomic background. And in ‘Geographies of Algorithmic Violence: Redlining the Smart City’, Safransky argues that municipalities’ use of data-driven analytics and algorithmic planning constitutes a kind of ‘algorithmic violence – a repetitive and standardized form of violence that contributes to the racialization of space and spatialization of poverty’ (2020: 200).
These authors have written about the ways in which digital platforms encode biased representations that claim authoritativeness and reveal that the ways in which data represent us are not only significant semantically but also are materially important for our life chances. As Crawford and Paglen write, ‘Representations aren’t simply confined to the spheres of language and culture, but have real implications in terms of rights, liberties, and forms of self-determination’ (2021: 1115). Crawford and Paglen made these conclusions after a study of ImageNet, a large visual database used to automate object recognition in a variety of deep learning projects. The database consists of more than 14 million images that have been hand-annotated to indicate what objects are pictured in which of more than 20,000 categories. Crawford and Paglen undertook an ‘archaeology’ of ImageNet (2020) to examine the taxonomies and labelling practices that result in, for example, a woman smiling in a bikini labelled as ‘slattern, slut, slovenly woman, trollop’ and a child wearing sunglasses as a ‘failure, loser, non-starter, unsuccessful person’ (2021: 1105). They write that ‘Datasets aren’t simply raw materials to feed algorithms, but are political interventions’ (Crawford and Paglen, 2021: 1113) and that it is critical to understand the architecture and contents of the training sets used in AI because ‘(t)hey can promote or discriminate, approve or reject, render visible or invisible, judge or enforce’ (2021: 1115).
ImageNet is a training dataset for a myriad AI and machine vision applications that inherit its shortcomings. If ImageNet is a highly influential dataset for training machine vision, Wikipedia is its parallel for training textual systems. Wikipedia (or more accurately Wikimedia, the family of websites operated by the Wikimedia Foundation) is widely used by the computing industry for training and evaluating natural language processing (NLP) models such as language models, text classification and sentiment analysis. Wikipedia's structured content has been used to populate Google's knowledge panels in ways that are much more dependent on Wikipedia than would appear from Google's scant attribution of them (McMahon et al., 2017), and a Washington Post investigation (Schaul et al., 2023) found that Wikipedia was the second most popular source of data for ChatGPT. Wikipedia's opening paragraphs and structured data are also used as answers to questions by Question and Answering (QA) systems embedded in the most popular virtual assistants around the world, i.e. Amazon's Alexa, Apple's Siri and Google's Assistant (Ford, 2022; Lewandowski and Spree, 2011), and for teaching AI systems more broadly (Dinan et al., 2019; Robitzski, 2017).
Wikipedia is a volunteer-driven website that exists in more than 300 languages. Despite the early promise that Wikipedia would enable the democratisation of knowledge and widespread, and global participation in its production, a decade of research has shown that Wikipedia suffers from bias against women, minorities and indigenous knowledge, that articles, topics and contributors are dominated by the United States and Western Europe (Graham et al., 2015), that topics with a female audience are weakly represented (Lam et al., 2011; Reagle and Rhue, 2011) and that women make up a tiny proportion of Wikipedia editors at about 16% of the editor population (Hill and Shaw, 2013).
Wikipedia's gender bias is profoundly skewed both in terms of participation and representation. Scholars have investigated the source of this skew in differences in Internet skills among men and women contributors (Shaw and Hargittai, 2018) and a culture that is emotionally demanding, conflict-ridden and sometimes abusive towards female participants (Collier and Bear, 2012; Menking and Erikson, 2015; Raval, 2014; Shane-Simpson and Gillespie-Lynch, 2017). In terms of representation, a cluster of research has investigated how Wikipedia represents women and men in its biographical articles as a means of measuring and diagnosing its bias problems. This research has found that biographies about women are less well developed than those of men, that Wikipedia's biographies about women are less likely to contain structured data or hyperlinks that connect them to other central articles (Langrock and González-Bailón, 2022), and that terms relating to gender, family and relationships are more frequently associated with female biographies, while men are more frequently associated with categories like politics and sports (Wagner et al., 2016, p. 14).
A key theme in this cluster is whether the tiny proportions of women and non-binary individuals in Wikipedia's database are more notable than the men. Demonstrating this feature would show how there is a glass ceiling effect on Wikipedia, where only the most notable women are deemed important enough for a page. Wagner et al. (2016), for example, examined all female biographies on English Wikipedia and compared it to Google search data to argue that – on Wikipedia, women are on average more notable than men according to the Google search data, and concluded that editors impose a higher notability threshold on women than men. Adams et al. (2019) found contradictory results in the coverage of American sociologists. Examining their H-indexes, for example, they found that male sociologists covered on Wikipedia have a median H-index of 27, whereas women's median H-index was lower at 22 (Adams et al., 2019: 8). Tripodi (2021), on the other hand, found that women are more often targeted for deletion through the nomination process, even if they ended up not being ultimately deleted because they were determined to meet the notability criteria. Miscategorisation of women's biographies as non-notable, she argues, ‘sheds light on another dimension of the emotional labor that editors endure when trying to close the Wikipedia gender gap’ (Tripodi, 2021: 4–5).
While the majority of research on Wikipedia's biographies has examined English Wikipedia, Konieczny and Klein (2018) examined the ratio of women and non-binary individuals to total biographies in the 285 language versions available at the time of their study. They found that Wikipedias in Confucian- and South Asian-cluster languages have higher ratios of female to male biographies than Wikipedias from other language clusters, but that this improvement was masking a different type of gender bias. Female biographies from these regions are more likely to be of celebrities (actresses, musicians, etc.) than those from other (e.g. English) Wikipedias so that simply measuring the number of females (and non-binary)) biographies does not adequately measure gender equality in workforce and politics.
We expanded on this research to further investigate the types of labour and significance that Wikipedia rewards. Rather than attempting to compute an objective measure of notability, we focussed on the temporality of article creation in relation to the external signals of notability that Wikipedia relies on to justify the creation of new articles. How does Wikipedia respond to external systems of notability in a local context? What kinds of women are favoured by Wikipedia in the creation of new articles? What unique biases can be attributed directly to Wikipedia, rather than to the world outside Wikipedia on which it mirrors only imperfectly?
Our study produces a dataset of Order of Australia awardees with English Wikipedia articles over time and according to gender. This work contributes to the aims of Big Data & Society to examine the consequences for how societies are represented by (big) data platforms. In doing so, we revealed three contributions to the study of bias and representation in digital platforms. The first is to highlight the importance of moving beyond simply counting gendered representation on Wikipedia to understand what types of labour and significance Wikipedia favours – care labour in this case. Moving beyond bias against women to understand how feminine labour and subjectivities are at stake in digital representation is highly relevant when digital platforms produce further barriers in recognising women ‘who have already established their credibility in a patriarchal system of accreditation’ (Luo et al., 2018). The second is to demonstrate the importance of temporal studies of Wikipedia in a particular locale. It is at a local level where we can examine contextual features that play a significant role in representation on digital platforms and where such gaps can often be more adequately acted upon (using institutional support networks such as Wikimedia's national chapters in this case). The third contribution is to highlight how Wikipedia produces rather than mirrors bias through its independent production of notability. Recognising how Wikipedia implements its own rules, norms and practices in deciding who it should represent is vital for advancing targeted responses to the inequalities that result.
Notability as a target for gender bias research
When Donna Strickland won the Nobel prize for physics in 2018, it was revealed that she had been denied a Wikipedia page prior to winning the prize because editors had determined that she was not notable enough (Cecco, 2018; Washington Post, 2018; Wikimedia Foundation, 2018). After encountering these critiques, the Wikimedia Foundation's then-executive director, Katherine Maher, wrote on Twitter: ‘Journalists—if you’re going to come after @Wikipedia for its coverage of women, check your own coverage first. We’re a mirror of the world's biases, not the source of them. We can’t write articles about what you don’t cover’ (sic @krmaher, 3 October 2018).
Maher later apologised for her misstep, acknowledging that Wikipedia is also the source of ‘systemic bias’, but that the community was working hard to mitigate against it. The ‘mirror theory’ dominates discourse about Wikipedia's role in representation and knowledge construction (Ford, 2022). It is also reproduced by some studies of gender gaps on the encyclopaedia that highlight how Wikipedia demonstrates gender biases that mirror biases outside of Wikipedia (e.g. see Jemielniak, 2016; Wagner et al., 2016: 22).
The measurement and production of notability is, however, articulated in unique ways by the entities responsible for such tasks. Academic recruitment, tenure and promotion committees all ‘routinely grapple with the question of how to assess and measure the quality of scholarship’ (Adams et al., 2019: 3). They codify these decisions into rules, policies and documents that are, in turn, affected by historical precedents, institutional strategies and technical logics. Like other entities responsible for the measurement and production of notability, Wikipedia has set out its own rules and guidelines to help editors determine whether an individual or group should have their own article on Wikipedia.
According to the policy, person does not need to have made a significant contribution, but instead needs to be ‘remarkable’ to have a Wikipedia page (Wikipedia contributors, 2023a). For example, criminals in addition to individuals awarded by the Nobel prize may hold Wikipedia pages. One can be notorious not only distinguished to have a page on Wikipedia. Wikipedia's notability policies declare that a person's notability should be evaluated according to external signals rather than through any internal assessment. People are notable when they ‘have gained sufficiently significant attention by the world at large and over a period of time’, and when that attention can be verified according to what editors regard as ‘reliable and independent sources to gauge this attention’ (Wikipedia contributors, 2023b). This means that ultimately it is up to individual editors to decide what constitutes ‘sufficiently significant’ attention, which period of time is long enough and which sources are ‘reliable’.
Despite this, the decision about whether an article warrants a page is made on Wikipedia without a recognition that Wikipedians are making individual judgements. According to the policy, Wikipedia offers a ‘neutral point of view’ of its subjects. It performs ‘no original research’ because all statements on Wikipedia have to be backed up by an external ‘reliable source’ rather than using individual editors’ own judgement. However, content cannot be copied from external sources for copyright reasons, so Wikipedia editors must summarise sources, evaluate their ‘reliability’ and prioritise certain statements of fact above others. This work is assumed to take place neutrally without the existence of any editorial adjudication. However, as Tkacz (2012: 92) argues:
While outside battles for truth are explicitly rejected – ‘The threshold for inclusion in Wikipedia is verifiability, not truth’ – Wikipedia nonetheless has a whole body of forceful statements whose function is to establish the truth of any particular statement; a truth of what is neutral, non-original, published, reliable, attributable and verifiable.
Wikipedians do the work of classifying knowledge about the world. In doing so, they actively participate in the construction of what constitutes reliable knowledge – including knowledge about individuals and groups. As Adams et al. (2019: 3) wrote in relation to English Wikipedia's coverage of American sociologists, assessments of quality are locally variable and the assessment of source reliability is at least to some extent a matter of perspective. ‘Although there are certainly more or less informed or expert points of view about the reliability of a given academic source, there is, of course, no neutral viewpoint independent of the observer’.
Adams et al. (2019) studied how English Wikipedia represents American sociologists and found the differences in academic rank, length of career and notability measured with both H-index and departmental reputation between men and women sociologists and white and nonwhite ‘explain only about half of the differences in the likelihood of being represented on Wikipedia’ (Adams et al., 2019: 1). In their ethnographic research, Gauthier and Sawchuk (2017) found that articles about women are rejected from English Wikipedia by editors who use policies to exclude feminist perspectives in a manner that is both inconsistent and arbitrary. And in an ethnographically informed quantitative study, Tripodi (2021) found that female subjects were more likely to be targeted for deletion on English Wikipedia. Despite making up less than a fifth of English Wikipedia biographies, women make up a quarter of biographies nominated for deletion every month (Tripodi, 2021: 10). Tripodi argues that the application of Wikipedia's notability guidelines therefore plays ‘a critical role in the perpetuation of gender inequality on the site’ (Tripodi, 2021: 2).
In this study, we compare English Wikipedia's biographical representation to that of the Order of Australia awards. These awards are important because they are the official platform for the national recognition of outstanding service and contribution in Australia. For most of the twentieth century, Australians were recognised through the (British) Imperial system of honours and awards for community and military service and bravery, which was administered relatively independently in each country of the Commonwealth. Although in Australia proposals were made as early as 1949 to initiate a national award, it was not until 1975 that an Australian Honours system was established. Modelled on the Canadian system, the new Order of Australia had four levels of recognition (Companion, Officer, Member and Medal of the Order), none of which carried a title.
The Order of Australia receives public nominations, but convenes a panel of representatives to judge a person's merit. Honours awards tend to be announced four times a year – the Order of Australia on the 26th of January and for the Queen's Birthday in June, and Australian Bravery Decorations in April and August – although occasionally special ceremonies are held outside this timetable. Candidates are notified that they are receiving an award up to two months before the lists are publicly announced, and award ceremonies are subsequently hosted by state governors and the Governor-General. The award consists of an insignia representing the division and level of the order and permission for recipients to use their designation letters after their name (such as ‘AC’ for Companion, ‘AO’ for Officer, etc.).
Like Wikipedia, Australia's national honours has been periodically criticised for its skewed representation. A review into such questions was held in 1995 (Review of Australian Honours and Awards, 1995); it highlighted several problems that have been endured: from political partisanship and the under-representation of migrant and indigenous groups, to the poor gender balance and a geographical distribution that is weighted towards urban recipients (Fox, 2022: 223). Carol Schwartz, the founding chair of the Women's Leadership Institute Australia, was particularly focussed on the lack of representation of women and began a campaign to increase the nomination of women that subsequently led to the Honour a Woman initiative (Our Community, 2017). It was only in June 2023 that, for the first time, since the Order of Australia was established in 1975, the majority of recipients in the General Division were women. The absence of indigenous design and the continued presentation of the awards on the 26th January, both issues identified in 1995 as likely to ‘contribute to the alienation of indigenous Australians’, remain unchanged (Review of Australian Honours and Awards, 1995).
For historians Fox and Furphy, questions about ‘the politics of national recognition’ and ‘what it may mean for honours to be ‘truly Australian’ are inherent in the system itself (Fox and Furphy, 2017: 103). The (short-lived) resurrection of the titles of knights and dames in the Order of Australia in 2014 was also widely criticised as carrying too much imperial baggage to be an appropriate way to recognise individuals’ service in twenty-first century Australia. Controversies over individual awards have been perennial (Fox, 2022: 248). While such controversies ultimately point to profound questions about societal understandings of merit, value and distinction, the passionate public debate that erupted when knights and dames were reintroduced suggests that the Order of Australia is also seen as having an important social role – one that is worth defending and making more representative.
Theoretical scaffolding
Both English Wikipedia and the Order of Australia awards represent biases and gaps that their stakeholders are seeking to overcome. Both systems independently determine who should be recognized, but there are some potential dependencies worth examining further. Wikipedia relies on the external signals of notability to determine whether someone should be subject to a Wikipedia article, and the Order of Australia awards is a potential trigger for such recognition – particularly for Australian Wikipedians. And the Australian Honours system might be influenced by the existence and content of Wikipedia articles about nominees: having a Wikipedia page written about a nominee could bolster the case of that nominee, particularly for under-represented groups. In this study, we examine the first relationship by asking questions about the timing of article creation and awards announcements and the type of labour for which awards are being granted. In doing so, we determine whether the awards are a trigger for page creation and whether there are any patterns in the types of people that Wikipedia awards a page that are different from who the Order of Australia recognises.
We apply a theory of knowledge which sees it as a situated practice that is historically and socially produced (Burke, 2012; MacKenzie and Wajcman, 1985). This framework holds that it is not only the identity of Wikipedia's editors, but rather their daily practices that shape representation (Ford and Wajcman, 2017). Those practices include (a) automated processes that mediate all interactions on the platform (its
Methods
Our study compared the temporality, gender and subject area for two datasets: English Wikipedia and the Order of Australia awards since Wikipedia's founding in 2001. The Order of Australia awards provides an opportunity to examine the key justification for a person's recognition beyond gender in a way that would shed light on Wikipedia's particular biases in selection and in the context of a national sphere where the results could be more contextually explained and responded to. It is important to note that we are not comparing English Wikipedia and the Honours Awards to see which is ‘better’ or which determines notability more accurately. Both Wikipedia and the Honours system are subject to gaps and biases. Both find unique ways of implementing importance in who they reward as either ‘notable’ or ‘remarkable’ (Wikipedia) or for ‘service worthy of a particular recognition’ (Order of Australia).
We ask three questions in order to understand practices of sorting notability on Wikipedia:
Our methodology is inspired by data feminism principles articulated by D’Ignazio and Klein (2020) that point to the need to attend to data's socio-political shaping. In particular, we focus on the principles of embracing pluralism and considering context. As D’Ignazio and Klein (2020) point out, ‘the most complete knowledge comes from synthesizing multiple perspectives, with priority given to local, indigenous, and experiential ways of knowing’, the data ‘are not neutral or objective’ and the production of data is the result of unequal social relations that need to be revealed.
Our method for analysing these data and reflecting on our own practice of producing and visualising relied on a group of scholars who argued that data analysis itself shapes what is discoverable. The study is the result of more than two years of iterative data analysis, conducted first in collaboration with members of the Australian Wikipedia and Wikidata community and then in conversation with other scholarly research on the topic. Our analysis is based on the comparison of two datasets. The first dataset represents people who have been awarded an Order of Australia over the awards’ history. This dataset is sourced from the Australian Honours Search Facility and is maintained by the Australian Department of the Prime Minster and Cabinet. Our dataset represents 45,606 honours awarded between February 1975 and January 2023. Our second dataset is sourced from Wikidata, and draws records based on a search of all people who have been awarded an Order of Australia on Wikidata. Wikidata hosts data about all English Wikipedia pages and is therefore an effective mechanism for making queries about the content of English Wikipedia. This dataset contains 5856 records. Both datasets have a ‘one-to-many’ relationship as people over time can receive multiple Orders. Once multiple awards are accounted for, the Order of Australia dataset represents 45,166 individuals, and the Wikidata dataset represents 5505 individuals. 1
From the Wikidata entry, we were able to link to an English Wikipedia biography page if it existed, and from there establish the page creation date. We used this mechanism primarily to work out if a person's page was created before or after they received their Order of Australia. From our initial 5505 Wikidata records, we recorded 4833 Wikipedia pages. Gender was allocated to each person on the Australian honours dataset using the Gender R package (Mullen, 2021) which infers gender based on the first name. Due to the inherent limitations of this process, output was then checked manually, and names that were not recognised were allocated a gender based on a manual internet search.
Recognising the historical and socially constructed character of gender, we follow other feminist scholars in the social sciences in analysing historical women, or individuals that have been understood as women (Owens and Rietzler, 2021). In doing so, we acknowledge the various ways in which those identifying as women have employed, resisted and negotiated gender norms. In relation to non-binary individuals, we are aware of six transgender recipients who identify as women. This includes Dr Clara Tuck Meng Soo who returned her Medal of the Order of Australia to protest Margaret Court being awarded a Companion (the highest order) in 2021, and Cate McGregor who is still listed for her Member of Australia recognition on the Honours Search Facility under her dead name.
We examine the kinds of achievements that are recognised in Wikipedia and the Order of Australia by examining the citation given for each Order. We focus here on women recipients in our quantitative analysis because we want to compare public gender identity features and their relationships to notability at scale. Frequencies were calculated for each term used in the citations, and then organised according to those who had a page created for them before and after receiving their Order of Australia. Finally, we analysed these subgroups according to their professions or types of services to understand what types of services are most likely to be recognised (either before or after the award signal).
In the next section, we draw out our findings in relation to the project's three core research questions. We conclude with a discussion of Wikipedia as a unique system of notability by exposing the relationship between Wikipedia and a single system of recognition.
Findings
RQ1: How does Wikipedia represent Order of Australia winners?
There are approximately 47,000 pages on English Wikipedia for Australian citizens compared to 43,705 people who hold an Order of Australia. Digging more deeply into the overlap between Wikipedia and the Order of Australia, we found that there are 4833 people who have both a Wikipedia page and hold an Order of Australia. While only 11% of all Order recipients have a Wikipedia biography, the higher the level of award, the more likely a recipient is to also have a biography on Wikipedia. The majority (87%) of AC (Companion of the Order of Australia, the highest award) recipients, for example, have a Wikipedia biography, but only 4% of those with an OAM (Medal of the Order of Australia, the lowest award) have a page on Wikipedia (Figure 1).

Proportion of the Order of Australia recipients with a Wikipedia page.
Overall, the representation of Australians holding an Order of Australia on Wikipedia is high for the level of Companion (35 awarded for ‘eminent achievement’ at each announcement). However, more than half of the Officers (140 awarded for ‘distinguished service of a high degree’ at each announcement) are not represented on Wikipedia. Members (365 awarded for ‘service in a particular locality or field’ at each announcement) have a very low representation. The Medal of the Order (service worthy of a particular recognition) has no limits and to date make up more than 60% of total Honours awarded. Just less than 5% of these people have a Wikipedia biography. Often, these awards go to community members for hyper local contributions or service, and they may not easily pass the Wikipedia's notability requirements.
When gender is taken into account (Figure 2), we see stronger proportional representation for women in the higher Order levels (97% of Companion female awardees have a Wikipedia page versus 85% male Companion awardees), but this needs to be considered in the context of only 122 women who hold a Companion compared to 503 men.

Proportion of the Order of Australia recipients with a Wikipedia page ordered by male and female genders.
Over time the number of women receiving an Order of Australia has increased (Figure 3). Women and men recipients are not at parity yet, but there has been a gradual increase since women made up only about 20% of the total awardees in 1975.

Proportion of male and female recipients in the Order of Australia awardees over time.
Up until around 2015, the number of Wikipedia biographies created for women recognised with an Order of Australia hovered between the 20% and 30% mark, but in the past seven years, this proportion has jumped significantly reaching more than 78% in 2021. While the number of Order of Australia awardees has increased over time, the numbers of Wikipedia pages for awardees are in decline (Figure 4). The majority of awardees’ pages were created between 2004 and 2008, but there has been a significant decline in page creation for Order of Australia recipients since 2014. Women are receiving a larger proportion of a shrinking base of the created biographies.

Number of Wikipedia pages created for female and male Order of Australian awardees over time.
Wikipedia's representation of Honours awardees was relatively equal for male and female recipients in Wikipedia's first five years (Figure 5). By January 2006, male recipients made up 25% of the cohort's total pages, with women not far behind – with 25% of their pages created by December of that year. Over the next few years, the number of pages created for men took off and outpaced the creation rate of women's pages significantly. In two more years, 50% of pages for men were created, while women had to wait until January 2012 for 50% of their pages to be created. By May 2013, men had three quarters of their total biographies created. Women did not reach that mark until September 2017. The pace of page creation for women has certainly been revived, and page creation for men appears to be levelling off. However, this rapid creation of pages for men between 2006 and 2008, when it was arguably easier to create new pages on Wikipedia, has contributed to the smaller proportion of Wikipedia pages about women today. This is exacerbated by the smaller number of women in total who hold an Order of Australia: there is simply a smaller pool of women to write pages about.

Cumulative creation of Wikipedia pages for Order recipients by gender and time.
RQ2: Is Wikipedia dependent on the Order of Australia as a signal of notability?
Given that Wikipedia page creation relies on external forms of verification, we next set out to explore when Wikipedia grants Honours recipients a page in relation to them winning the award. Examining the whole cohort independent of the exact dates of awards, we found that most Honours recipients’ Wikipedia pages were created after the person received their Honour simply, because so many awards were in circulation when Wikipedia originated in 2001 (Figure 6).

Order recipients with a Wikipedia page created before or after the Order was received.
If we remove all those who received an award prior to the start of Wikipedia, we see that the proportions shift: we see an almost even split between those who had their page created before or after they received their Order. Female Honours recipients make up a slightly higher proportion (52% female versus 47% male recipients) of those who had a page created for them after their award (Figure 7).

Order recipients after 2001 with a Wikipedia page created before or after the Order was received.
Narrowing in on the exact dates of the Order of Australia announcements (in January and June every year), we examined when a page is created in relation to the date of the Order announcement. We found that the announcements of the awards on the 26th January and in June every year act as a direct trigger for page creation. There was a noticeable jump in page creation in the week of the Order announcements (represented as ‘week 0’ in Figure 8), when 53 biographies were created for Order recipients. Several other peaks correspond to the dates of edit-a-thons, and other initiatives such as the project to increase the representation of Australian Paralympians in the site (Wikipedia contributors, 2023c). All of the Paralympians represented are also Order of Australia recipients.

Number of Wikipedia pages created for the Order of Australia recipients before and after the Order announcements.
The week of the announcement is a strong signal for editors to create biographies overall. Fifty-three Wikipedia pages for Order recipients were created in the week of the Order announcement compared to just under two in other weeks. However, the signal was the strongest for individuals who were honoured at the Officer and Member levels. The heat map in Figure 9 shows the intensity of biography creation for each level of Order recognition.

Heat map showing Wikipedia pages for Order recipients created 52 weeks pre- and post-announcement according to gender.
The difference in Wikipedia page creation before and after the Order announcement in relation to gender is pronounced (Figure 10). While there are ‘peaks’ for both male and female recipients, pages created for men are more dense and higher in the periods after the award announcement. While the total number of pages created for women during the week of the announcement is lower than the total number of pages created for men, women do benefit from having Wikipedia editors’ awareness raised about their achievements and service during the Honours announcement week.

Number of Wikipedia pages created for the Order of Australia recipients over time and in relation to the Order announcement week (week 0).
RQ3: For which genders and fields of work is the Order of Australia a signal of notability?
For our third and final research question, we were interested in whether the patterns in Wikipedia creation could be explained by the types of services (labour) for which Order recipients were being commended. We know that the work of care (education, social work, child care and health work) is disproportionately undertaken by women, and we also know that this work is economically and socially undervalued (Duffy, 2011). What kinds of works were recognised as notable by Wikipedia and the Order of Australia? In order to answer this third research question, we examined the text from each individual's citation describing the notable service he or she had been recognised for. Focussing on female recipients, these citations were broken into single keywords and each word was tagged to whether it belonged to an individual with a Wikipedia page or not, and if so, whether his or her biography was created before or after they received his or her Order of Australia (Figure 11).

Scatter plot to demonstrate the types of services for which Wikipedia awards a page in relation to the Order announcement date.
Comparing the citations of women who do not have a Wikipedia page compared to those who do surfaces significant differences. Women who do not have a Wikipedia biography but do have an Order of Australia have more mentions in their citations of terms such as education (519 versus 203), medicine (314 versus 42), care (114 versus 29), nursing (104 versus 12), teacher (34 versus 10) and aged (29 versus 4). The citations of women who have an order, as well as a Wikipedia biography have more mentions of words such as sport (148 versus 4), arts (135 versus 64), gold (114 versus 1), industry (46 versus 13), parliament (51 versus 10), judiciary (35 versus 11) or film (24 versus 1). In other words, women whose service is for labour relating to the caring professions are less likely to have a Wikipedia article written about them than if their service is for sports, arts and films, politics or the judiciary.
Discussion and conclusions
We examined the relationship between English Wikipedia and the Order of Australia and found that rather than distancing itself from decisions about what is notable and deferring to external signals, Wikipedia does not consistently rely on such signals. We examined Wikipedia's role as an independent producer of notability through the lens of a particular award system and the extent to which the award announcements trigger articles and for which genders and types of labour. We found that only a small proportion of those receiving an award is represented on English Wikipedia. Individuals at the lower levels of national recognition (Officer and Member) are proportionally under-represented on Wikipedia. Most importantly, women are more likely to have a Wikipedia article created for them after they have received one of the higher awards and if they have undertaken service related to sports and politics as opposed to one of the caring professions. In other words, English Wikipedia demonstrates a bias against women's care work as a valid source of notability, even when it is recognised as significant by an important national awards system.
Wikipedia serves to reinforce existing disparities between the kinds of gendered work and labour that are valued (and made visible) in societies like Australia. There is well-developed literature on the gendered dimensions of care work (Dodds, 2007; James, 1992; Kittay, 2013; Parks, 2003). In 2005, Pocock (2005: 32) identified in Australia ‘work/care regimes that are more or less hostile to the needs of paid workers who care for others’. Little has changed since then. Women remain over-represented in unpaid work sectors, taking on a greater proportion of unpaid caring, domestic and volunteer work than men. Women are also over-represented in the paid care and community domains. Occupations such as child care, personal care, nursing, clerical and sales work, domestic housekeeping, primary teaching, special education, social and community work, librarianship and hairdressing are all highly feminised. In all these fields, women constitute at least 70% of employees (Preston and Whitehouse, 2004). The value accorded to this care and service work is mostly discussed in terms of remuneration, which is itself highly unequal. Smith and Whitehouse, for example, point to ways wage setting systems reflect ‘the accumulation of structural inequalities and gendered practices’ (Smith and Whitehouse, 2020: 533).
In some ways, Wikipedia's inability to recognise gendered care work as noteworthy is mirrored in its own practices. Several researchers have argued that the labour required to maintain what Menking and Erickson call Wikipedia's ‘culture of knowledge production’ may itself be gendered. They detail ‘the strategic work women engage in to adjust their actions and/or feelings to sustain their participation in Wikipedia’ (Menking and Erickson, 2015: 208). As one of the editors they interviewed stated: the ‘only acceptable model for behaviour on Wikipedia is to behave like a man, which is to ignore all the bullshit’ (Menking and Erickson, 2015). Several studies have examined how Wikipedia's adversarial culture might negatively affect women (e.g. Ford and Wajcman, 2017; Kittur et al., 2007). Navigating this culture and striving to reform it, as Howard and Irani outline, is itself a form of labor and of care – something that is too rarely recognised by researchers and community members alike (Howard and Irani, 2019). As Ahmed observes in her study of diversity practitioners, the work of bearing harmful consequences is distributed unequally even in organisations seeking to improve their practices (Ahmed, 2017). Wikipedia is, in the words of Menking and Erickson, ‘a storied space of democratic values and meritocracy in action’, yet this ‘idealized veneer’ masks a set of practices that are gendered both qualitatively and quantitatively (Menking and Erickson, 2015).
On Wikipedia, the practice of care labour is unequally distributed to women. Our analysis of biographies in relation to an external signal of notability demonstrates that caring labour is similarly disregarded. This study, then, aligns and extends the work of data feminism by demonstrating how caring labour is de-prioritised in collaborative knowledge projects like Wikipedia. This matters because Wikipedia has come to stand in for what constitutes ‘public’ (rather than private) knowledge. It assumes this role when it is used as a source of data for multiple third party platforms. Wikipedia is a power-broker that determines representation in a host of other, sometimes even more powerful sites.
When Wikipedia fails to recognise care work which is ‘undertaken out of a sense of compassion or responsibility for others, rather than with a goal of monetary gain’ (D’Ignazio and Klein, 2020: 199), this has implications for opinion formation and decision-making in other contexts. As Langrock and González-Bailón write, gendered inequalities on Wikipedia ‘can have large effects for information-seeking behaviour across a range of digital platforms and devices’. For better or for worse, Wikipedia now shapes people's understanding of who and what is important, significant and notable (i.e. who is worth paying attention to). Having a biography on Wikipedia has become a signal of authority, of individual worth and importance.
What are the implications of these findings for Wikipedia in Australia? Through one lens of interpretation, the Order of Australia is crucial in acting as a stimulus for wider recognition on Wikipedia for gendered forms of work, which are disproportionately accorded less societal recognition in terms of pay and status. Recognition on one platform can produce recognition on another. The Honours system can therefore be used as a tool that draws attention to those individuals that Wikipedia has not hitherto recognised as notable, just as Wikipedia can be a tool to accord recognition to those the Honours system leaves out. However, from another lens, these two systems of notability (each with their own biases) reinforce each other: they are both systems in which women are recognised only when they fit particular standards. The Australian Honours system's practice of not recording gender data, and both systems’ practice of not recording other demographic data such as ethnicity, compound these problems still further.
As D’Ignazio and Klein (2020) argue, the solution to the inherently cultural and political nature of data is to use data to challenge power hierarchies and systems of counting and classification that perpetuate oppression. Ultimately these tools lie in the hands of Wikipedia editors and members of the Australian public, any of whom can submit a nomination to the Order of Australia or add an article to Wikipedia. A number of initiatives have been created to address this issue, including Women in Red, Women in Green, Wiki Loves Women, WikiGap, WikiWomen's Collaborative, WikiHerStory and Editona and non-profit organisations such as Art + Feminism or 500 Women Scientists. These groups could focus on including more individuals who are recognised by awards like the Order of Australia for caring labour. Such efforts can yield success in filling gaps in coverage. We found a significant influence of the History of the Paralympic Movement in Australia (HoPAu) project, for example, in improving the representation of female Paralympians (who are also Order recipients). At the same time, for Wikipedia editors wishing to increase the diversity of representation, focus could be placed on identifying and creating pages for Officer level women, given they have been honoured for ‘distinguished service of a high degree to Australia or to humanity at large’, which in most cases should surely meet the Wikipedia's notability criteria.
However, as Langrock and González-Bailón write, we ‘acknowledge that solving existing gender gaps requires measures other than adding profiles of women to the platform’ (2022, p. 7). Despite the fact that adding female biographies is often met with significant hostility (Tripodi, 2021), ‘there is no easy technical “fix” [to bias problems] by shifting demographics, deleting offensive terms or seeking equal representation by skin tone’ (Crawford and Paglen, 2021: 1113). The project of categorizing people as notable or not notable enough – as encyclopaedic or unencyclopaedic, as the subject of ‘enough’ attention or not is a kind of politics. Central to that politics is a struggle about who gets to decide who is worthy and what constitutes the kind of valid knowledge by which those decisions are made.
Studies like ours emphasise the need for a more considered practice to acknowledge diversity in the representation of Wikipedia biographies more generally. Wikipedia pages for women are another building block for advancing a career in the same way that citations are the ‘academic bricks’ (Ahmed, 2017) that are used to measure a scholar's success, granting access to promotions, higher salaries, funding and invitations to speak at conferences. Zurn et al. (2022) found that undercitation of women scholars and scholars of colour can have a negative consequence for a scholar's career advancement, ‘hindering visibility, diminishing perceived prestige and stalling promotion’. For those working to advance academic equity through supporting citation diversity, it is important for everyone working in the system to be more reflexive about their own practice, rather than only women or those who identify as non-binary. Bricks ‘do not just fall into place; they are placed’. Anticipating the effects of citation imbalance and choosing to cite differently is part of what an ethical and deliberative scholarly practice should entail.
In the same way, the under-acknowledgement of women and people of colour on Wikipedia contains risks to those individuals, but also serves to narrow the scope of our knowledge about people and their expertise and experience (Zurn et al. 2022). The response should similarly be about encouraging a more deliberate and reflexive approach to editing across the encyclopaedia. Along these lines, Wikipedia, then, via the Wikimedia Foundation or Wikiproject groups, could try to educate editors, particularly those who work in the policing of new articles, about what a more deliberate approach might look like, given Wikipedia's significant imbalances. Editors could intermittently be shown summaries of the results of research like ours before they proceed to editing, and be invited to reflect on their own practice of gatekeeping on the encyclopaedia. Wikipedia's biases can have very real world implications – not only for the people who are acknowledged by Wikipedia, but also for the possibilities of recognising the meaning behind gendered labour in this case. Wikipedia is crucial for meaning-making, as much as it is for individual recognition. As the Art + Feminism project team state on their website: ‘When cis and trans women, non-binary people, and people of color, and Indigenous communities are not represented in the writing and editing of the tenth-most-visited site in the world, information about people like us gets skewed and misrepresented. The stories get mistold. We lose out on our real history’ (cited in Langrock and González-Bailón, 2022: 6).
Wikipedia is a data source for AI and machine learning systems and its representations reverberate through our information spheres. As Tripodi writes, the ways in which women subjects are represented on Wikipedia ‘holds wider implications than just Wikipedia representation’ (Tripodi, 2021: 5) with implications for ‘how a person's gender affects their perceived significance’ (Tripodi, 2021: 2). Although there has been significant work to map gender biases on the encyclopaedia, an important step is to recognise Wikipedia's unique role in shaping the world. Rather than a neutral source that perfectly reflects the world outside, Wikipedia needs to be recognised as an independent producer of notability. Wikipedia's editors, working within the rules of the system and aided and constrained by the tools available to them, actively produce notability. They use external signals like the Order of Australia as possible sources of evidence, but they alone decide when and how to mirror those signals. As such, they are privileged and powerful actors in a politics of knowledge that is far from open, transparent or harmless. The stakes of how they use that power are high.
Footnotes
Acknowledgements
The authors would like to acknowledge funding from the Australian Research Council's Discovery Programme (DP220100662) that made this work possible. They would also like to thank the reviewers for their suggestions that greatly improved our early draft of the manuscript. They also acknowledge that this research was made on the Gadigal Land of the Eora Nation.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This work was supported by the ARC Discovery Project (grant number DP220100662).
