Abstract
From critiques of baked-in sexism in data science, to the use of data in the service of feminism, feminist data activism has emerged as a new form of feminist activism. This paper approaches feminist data activism from a data imaginary perspective, focusing on a prominent feminist initiative from Australia called She's A Crowd, an organization that claims to have crowdsourced the world's largest dataset of gendered violence. Through interviews with 11 participants who volunteered their “datafied stories” to the organization, I explore the grassroots imaginaries about what data is and what it can do for the collective struggle against gendered violence. I show that participants’ experiences with not being believed led them to see data-driven stories as having superior epistemic value over qualitative narratives. Paradoxically, even when data is viewed as superior due to its detachment from the personal, concerns about its authenticity and quality persist. Consequently, participants advocated for increased data collection as the ultimate solution to address these limitations. Thus, if the imaginary of a binary between “data” and “stories” privileges data as a superior epistemic solution, the imaginary of limitation reinforces more data collection as the only solution imaginable. I argue that at stake is how these imaginaries locate the legitimacy of marginalized experiences within the dataset, obscuring how data collected from the grassroots might circulate within and be interpreted by hegemonic knowledge practices. This paper opens a conversation about feminist data activism and the power relations it is enmeshed within, an area that remains under explored.
Keywords
Introduction
What was once an object of searing critical analysis has now been reimagined as a central tool in political struggles. That object is data. The shift has been so pronounced that in 2019, critical data scholars Davide Beraldo and Stefania Milan (2019: 8) observed that for social movements, “relying on data-enabled action repertoires is gradually becoming unavoidable.” Data activism is an emergent field of scholarship, exploring both the potential and constraints that data presents for civic engagement, advocacy, and activism. And feminists are not outsiders in this movement. From criticisms of baked-in sexism and other forms of discrimination to channeling their efforts into harnessing data to advance gender equity and justice, feminist scholars and activists have been central figures in shaping the movement's values and practices (D’Ignazio et al., 2022; D’Ignazio and Klein, 2020). This has led scholars such as Yu Sun and Siyuan Yin (2022) to declare that feminist data activism has emerged as a new form of feminist activism.
This paper approaches feminist data activism from a data imaginary perspective (Beer, 2019). Specifically, I seek to understand how everyday people imagine and justify the necessity of data in their struggles against gendered violence. I focus on a prominent feminist data initiative from Australia called She's A Crowd, an organization that promotes itself as having curated the world's most extensive geospatial dataset of gendered violence (Humanitech, n.d). This dataset is composed of crowdsourced first-hand accounts from people who have experienced such incidents and volunteered to share their stories on the She's A Crowd platform. The term “counter data,” therefore, is often used to refer to this type of data activism, denoting data collected from the marginalized groups that fills the data gap in official statistics, challenges mainstream perspectives, and ultimately, represents an ethical use of data that runs counter to conventional extractive approaches. That is, data can be used in the service of social justice rather than just for the accumulation of wealth and power (D’Ignazio et al., 2022). Current scholarship on data activism often centers the voice of the “active” activists—the highly skilled practitioners who pioneer data activism efforts (see, e.g. Alvarado Garcia et al., 2017; Cinnamon, 2020; Gutiérrez and Milan, 2019; Lehtiniemi and Ruckenstein, 2019; Meng and Disalvo, 2018), or the civic tech participants—who participate in hackathon events and are often technologically skilled (see Currie et al., 2016). Conversely, this paper seeks to contribute to the data activism literature by pivoting our attention to the everyday people—the contributors who did not necessarily identify themselves as “activists” in the traditional sense, nor did they consider their act of sharing on the platform as “doing activism.” Nonetheless, they constitute a central part of data activism efforts, as without their volunteering of stories/data, such counter datasets would not exist.
The data imaginary approach I adopt in this paper also runs counter to the dominant approach to data activism that celebrates the potential for individual political agency emerging through citizen's deliberate engagement with data (Dander and Macgilchrist, 2022). Instead of the universalized sovereign “digital citizens” who are relatively autonomous—those who are not only conscious of the value of their data, but also deliberatively mobilize data in the service of social justice (Dander and Macgilchrist, 2022: 45), a data imaginary approach sensitizes us to the way in which individual perception of data is always bound up with and shaped by preexisting cultural narratives, thereby highlighting where power relations may continue to exist. In doing so, this paper contributes to the emerging literature on civic data imaginaries and how they are always bound up with both inherent features of the datasets (Nafus, 2024) and collective perception (Baack, 2018; Cinnamon, 2020).
This paper contributes to the emerging scholarship on feminist data activism, and by extension, broader scholarship that has explored the various entanglements between feminisms and digital technologies, including networked feminism (Clark-Parsons, 2022; Fotopoulou, 2016), feminist hashtag activism (Rentschler, 2017), platform feminism (Trott, 2023), and more broadly, digital feminist activism (Mendes, 2019). If relying on data is indeed “unavoidable,” it then becomes an imperative to examine how feminist activism is being mediated by data and data infrastructures, as well as the implications of this data turn on feminist activism and advocacy. If we do not, social movements risk becoming privileged sites where the power of data becomes further mythologized.
Through in-depth interviews with 11 participants, I explore what I term everyday (counter) data imaginaries: grassroots collective imaginaries about what data is and what it can do in the service of social justice. In other words, I examine how and why (counter) data become favored in this particular struggle against gendered violence, and ultimately, what these imaginaries might obscure and foreclose. Data imaginaries, like other social imaginaries, do not exist in a vacuum, but instead are collectively held narratives that draw on existing understandings, norms, values, and experiences of the world (Taylor, 2004). Drawing from the interviews, I show that participants construct a binary opposition between data and other nondata narrative forms, and significantly, this binary emphasizes data as having superior epistemic value in the narration of ones’ experiences. Second, while participants are critical of the many limitations of data, they paradoxically advocate for increased data collection as the ultimate solution to address these perceived limitations. As I shall argue, these everyday imaginaries of the superior epistemic value of data cannot be simply dismissed as yet another manifestation of data solutionism. While yes, they stem from dominant narratives about the power of data, they are also rooted in marginalized groups’ historical struggles for believability and epistemic justice (Banet-Weiser and Higgins, 2023; Fricker, 2007). At stake are how these imaginaries normatively locate the legitimacy of marginalized experiences within abstracted and disembodied datasets, which obscures how counter data, empowering as they may seem, are still circulating within, being interpreted, and put to use by hegemonic knowledge practices. Crucially, this paper attempts to open a conversation about feminist data activism and the power relations it reproduces and is enmeshed within, an area that remains under explored.
She's a crowd, counter data, and data-driven storytelling
She's A Crowd was founded by then doctoral researcher Zoe Condliffe in 2018 in the wake of the global #MeToo movement. The initiative's mission is to use “counter data” to challenge “existing power dynamics” (She’s A Crowd, n.da). As Condliffe explained in a media interview, the term “counter data” refers to “information that counterbalances the widespread use of biased data” (MYOB, 2022). For her, in domestic violence cases, “there's reason to doubt official statistics capture the full story,” not least due to the underreporting nature of the issue (MYOB, 2022). Consequently, Condliffe recounted the development of She's A Crowd as a space to address this deficit in official statistics through crowdsourcing from people who have experienced first-hand gender-based violence. She's A Crowd is not unique, but forms part of a larger ecology of feminist initiatives that have been instrumental in generating and sustaining momentum in challenging cultures of sexual violence and misogyny through digital storytelling, mapping of incidents, and crowdsourcing bottom-up data, and that have attracted significant scholarly attention (see Bernardi, 2017; Dimond et al., 2013; Grove, 2015; Rentschler, 2018; Trott, 2023). For instance, organizations such as Hollaback! (now RightToBe) have long histories of using bottom-up data collection methods for advocacy activities. The overall focus of RightToBe is on facilitating what has been referred to as “networked witnessing” (Rentschler, 2018) or “networked public” (Dimond et al., 2013), which emphasizes the feeling of solidarity and collective empowerment gained through sharing one's story through internet-connected devices. Likewise, She's A Crowd builds on this established approach, while explicitly positioning itself as a “data activist” organization with a strong emphasis on the centrality of data and data-driven insights as the core instrument in its mobilization. This is reflected in the mission statement displayed on their website: “Counteracting violence through data collection” (She’s A Crowd, n.da). Six years since its founding, the organization has worked with different stakeholders such as governmental transport departments, rideshare companies, and various music venues so that these stakeholders can design safer and more inclusive services (She’s A Crowd, n.db, 2023).
In an interview with the online magazine Body + Soul (Barraclough, 2019), Condliffe shared about the moment that catalyzed her development of the platform: “I came across the consciousness-raising movements that were prominent in the 1960s feminist liberation movement. Women then were doing the same thing: meeting up and talking about their experiences of sexual assault and sexism. I thought: why not digitize this process?”
The term “digitize,” however, does not fully capture what actually happens with experiences shared on the platform. I would argue that a more accurate term would be “datafy” since the platform facilitates systematic documentation of experiences in a standardized process. Specifically, if someone wants to share a story on this platform, they are guided through a 10-step process, covering a range of aspects related to the experience, such as the timing, location, whether victims know their perpetrators, perceived reasons, and consequences. The participants have the discretion to skip any aspect that they feel uncomfortable sharing. The stories shared are then simplified and deidentified before being displayed on a public map while the complete dataset, along with analyses such as trends and patterns, are only available to specific stakeholders or policymakers who make a purchase. This process, hence, amounts to “datafication,” signifying not only the quantification but also the transformation of embodied experiences into standardized, machine-readable formats. Operating its own closed ecosystem, She's A Crowd largely controls how data are collected, managed, processed, stored, and circulated. This, as Verity Trott (2023: 25–26) argues, signals a strategic shift from ad hoc, reactive awareness campaigns to more “formalized, sustained forms of activism.”
To the extent that She's A Crowd facilitates the sharing of personal experiences on its platform, it is situated within the long tradition of what Berna Ekal and Åsa Eldén (2019) conceptualize as “feminist storytelling”: the narrating of personal experiences for political purposes. The practice of narrating personal experiences, as Nelson (2018) contends, is the unifying element that has bound women together throughout different waves and various political projects of feminism. Narrating—making experiences communicable—is politically significant not only because it helps make women's experiences visible, but more importantly, it lays bare the collective, structural roots of what is often deemed to be personal problems (Ekal and Eldén, 2019). In this sense, the diverse storytelling practices within feminist activism, spanning from physical settings such as closed consciousness-raising groups and shelters to digital platforms including online blogs, social media, and counter data and mapping platforms, collectively embody a longstanding tradition of sharing personal experiences for political, feminist ends. Indeed, She's A Crowd consciously evokes this long tradition of feminist storytelling in the positioning of their organization, branding itself as an “innovative storytelling platform.”
The 10-step process on the platform undeniably influences how stories are narrated, dictating what aspects should be recalled, and in what sequence. This is not to say other forms of sharing do not have a genre convention that conditions what is speakable and how it is spoken. For instance, #MeToo, ultimately, is a genre of discourse, circulating within a capitalist attention economy and commercialized logics of visibility that has been centering the experiences of female subjects who are predominately white and wealthy (Banet-Weiser and Higgins, 2023; Hewa, 2022). Rather, this is to highlight that crowdsourced platforms such as She's A Crowd represent a distinct genre of feminist storytelling, namely datafied storytelling, that merits more scholarly attention. Indeed, in my discussions with participants, numerous accounts emerged regarding the limitations of this data-driven storytelling approach in fully capturing the complexities and nuances of their experiences. The primary interest of this paper, however, is not so much in how She's A Crowd's storytelling framework may constrain how stories and experiences are narrated—although this is a significant issue. Instead, my focus is on understanding why, despite acknowledging all these constraints, participants continue to willingly volunteer their stories and data on the platform. In simple terms, how do they justify the need to convert their experiences into data?
A data imaginary approach
Scholars in the evolving field of data activism have examined the ways data have been used to promote social justice, the extent to which data can be used to empower grassroots actors, and its many limitations (see, e.g. Gutiérrez, 2018; Lehtiniemi and Ruckenstein, 2019; Meng and Disalvo, 2018; Milan and Gutiérrez, 2015; Milan and van der Velden, 2016; Milioni and Papa, 2022; Schrock, 2016). Complementary to this, data feminism focuses on the power dynamics that unfold in a data-driven society, considering how to apply feminist theories and principles to create a more equitable form of data science (D’Ignazio and Klein, 2020). Pulling these two fields together, researchers in feminist data activism have explored how data have been and can be used in the service of gender equity and justice (D’Ignazio, 2022; D’Ignazio et al., 2022; Sun and Yin, 2022), as well as how the use of data by some feminist activism projects contributes to reproducing existing power structures when the prevailing logic of datafication clashes with feminist epistemologies (Fileborn and Trott, 2021).
Within the current scholarship, however, data is largely taken as fixed and stable objects, be it visualizations, numbers, or statistics (Fileborn and Trott, 2021; D’Ignazio, 2022; D’Ignazio et al., 2022; Sun and Yin, 2022). This perspective overlooks how data is also historical and always embedded within particular political struggles. That is, what data is and what it can do is continually (re)produced by individuals through their interactions with it—through the way they articulate, advocate, interpret, challenge, celebrate, and deliberate over data's significance and values (Dourish and Gómez Cruz, 2018). To put it differently, data is discursively and communicatively constructed by a range of stakeholders and technological systems (Thornham and Gómez Cruz, 2016). In their study of hackathons in the UK, for example, Thornham and Gómez Cruz (2016) argue that data is discursively constructed as “clean,” “malleable” and “open,” hence, being positioned as a bedrock for innovation and creativity. They argue that this discourse of cleanliness, malleability, and openness problematically obscures the significant efforts invested in the cleaning of data, a process that often carries significant political implications. It is this understanding of data not just as material and technical objects, but also as “discursive regime”—the collective visions that promotes and weaves data into everyday practices—that shapes the core premise of my study.
Research on data imaginary provides some guidance here. Drawing on Nagy and Neff (2015), Baack (2018) explored the “imagined affordances of data” by civic technology activists, revealing how the work data do depends not only on their inherent features or properties but also on the perceptions held by data activists. Similarly, Cinnamon (2020: 628–629) has analyzed how various data activism organizations in South Africa operate via a particular collective understanding of data—a data imaginary that asserts data as “necessary, suitable and effective” for exposing injustices and grounding their advocacy on a “rational and scientific plane.” In this study, Cinnamon (2020) calls for more attention to be paid to the way in which data are perceived and discursively constructed by data activists. This perspective aligns with Thornham and Gómez Cruz (2016: 5), who argue that “discursive constructions of data have material consequences in terms of what becomes possible to build, to imagine or to utilize.” Simply put, what is imagined shapes what is done.
Method and data
My study, however, diverges from the existing scholarship on data activism by turning its focus on the everyday as a critical site where collective imaginaries and beliefs of data are (re)produced. That is, instead of focusing on the pioneers and professionals—the highly skilled practitioners who initiate, organize, promote, and mobilize the data activism efforts, I focus on everyday participants who volunteered their data/stories to the She's A Crowd counter dataset without holding a highly involved role in the organization and without identifying themselves as “data activists.” In particular, this paper draws on 11 in-depth semistructured interviews with people who volunteered their data and experiences of gendered violence to the dataset of She's A Crowd. I also draw from an in-depth interview with a staff member at She's A Crowd as well as publicly available material on its website and social media. The interviews examined here form part of a broader research project examining the history, political economy, and everyday imaginary of feminist data activism. All interviews were conducted between March and July 2023.
The interview questions were guided by my theoretical approach, which is the notion of “data imaginaries,” aiming to explore the significance and meanings that individuals attach to data. The interviews were largely unstructured while revolving around three major themes: How do everyday people rationalize and justify the necessity of data? What do people hope to achieve, personally or collectively, when they contribute their stories of gendered violence to She's A Crowd? Finally, do people have any concerns or reservations when contributing their stories as data? I adopted an inductive approach to analyze my data, meaning I did not have predetermined categories or hypotheses regarding what motivated my participants to share their stories in datafied formats. At the same time, in keeping with the notion of data imaginaries (Beer, 2019), I maintained a critical orientation in my interpretation, attending to the social embedding of ideas and seeking to ground individual subjective experiences within historical–sociocultural texts and contexts.
While I did not specify gender in my recruitment, a majority of the volunteers identified as women. Among them, nine identified as women, one as queer, and the remaining one as a man. The ethnicities of the participants were diverse, with seven identifying as white/Caucasian, three as South Asian, and one as African American. One participant reported living with a disability. All interviewees are anonymized by way of a pseudonym. The call for participants was shared on my Twitter account and pinned on my university's notice board, which limited the visibility of the call to my network, who were already part of an urban and relatively highly educated context, thus limiting the diversity of socioeconomic and geographic backgrounds. Indeed, a majority of my participants were living in metropolitan areas and had university degrees. With this sample size, this paper does not make claims about broad population-wide trends, but it does aim to help us better understand how everyday people deliberate on the necessity of data, how they discursively imagine the power (and limitations) of data, and what this might mean for social movements addressing gendered violence.
Researcher positionality
I have experienced sexual assault and have faced various forms of gender-based violence throughout my life. This includes everything from casual remarks in public spaces to abusive circumstances in my family environment. Violence for me, as Crooks and Currie (2021: 206) suggest, “is not invisible and waiting to be documented: it is an intimate fact of daily life.” As a media studies scholar who is informed by conceptual frameworks from feminist science and technology studies (STS), feminist media studies, and critical data studies, I have found it unsettling that the collection of “counter data”—data from the marginalized—has been celebrated as inherently empowering. While the tensions are acknowledged—as in D’Ignazio and Klein's caution against the burden of proof which is always imposed on marginalized community members to identify and substantiate claims of their oppression (2020)—they are often added as caveats, rather than central tensions within data activism that we must grapple with. Ultimately, while I wholeheartedly endorse and respect many efforts to collect data to understand more about marginalized lives, I also advocate for what Miranda Fricker (2019) terms a “healthy pessimism” that is always attentive to the conditions under which knowledge about marginalized people are sought and the interpretative lenses within which this knowledge is enframed.
Acknowledging my positionalities, as both a survivor and a media scholar, is not to claim an insider perspective. Nor am I suggesting that my academic background grants me a more “informed” perspective compared to my participants. Rather, this is to highlight that I do have stakes in the research, as well as to openly disclose my theoretical framework and its role in my contextualization and interpretation of participants’ responses. Indeed, I have presented this research at conferences and seminars, and one of the recurring comments from the audience has been about how I managed to separate my own experience of gendered violence from my identity as a researcher (who is clearly critical of datafication.) My answer was that I did not try; and I believe that it would be futile to try. It is impossible for me to separate my identity as a victim and as a researcher, for it was these very own identities that influenced my focus and my research question in the first place. What I have done instead is maintain transparency with my participants and now with you, the readers, regarding my own experiences, the motivations behind my research, the questions I am exploring, and my approach to analyzing participants’ responses. In the paragraphs that follow, using data imaginary as a framework, I will present how participants narrativize the need for translating their experiences into data forms before discussing the fragilities of these imaginaries.
Findings
The binary imaginary: Data is superior to personal anecdotes
For many participants, it took years since the incidents for them to open up on She's A Crowd, either because the platform didn’t exist, or they weren’t aware of the platform back then, or because it took a considerable amount of time for them to come to the realization that the incident was, indeed, significant enough to share. However, before the very moments of sharing on She's A Crowd, most participants often had already shared their experiences elsewhere, including with law enforcement, close friends and family, or their counsellors. Often, the responses they received were either dismissive or involved outright victim-blaming. These experiences reflect the criticisms levelled at processes of reporting in which once a story of gendered violence is shared, once a testimony of harm is uttered, women are immediately placed within an economy of credibility, where their claims to knowledge are met with condescending disbelief (Banet-Weiser and Higgins, 2023; Fricker, 2007; Rentschler, 2022). It is this context of disbelief that spurred many of the participants I spoke with to hold a perception of a binary between data and other (qualitative) narrative forms. More significantly, they privileged data-driven storytelling as a more credible means—a superior epistemic solution—to narrating personal experiences. As I will later discuss, instead of an instance of “data solutionism”—a critique often directed to those who uncritically embraced data as the solution for every complex social issue—this desire for and privilege of data should be understood within this “economy of believability” (Banet-Weiser and Higgins, 2023)—a gendered, racialized, and classed economy that has historically devalued and discredited women and other marginalized groups in their claim to know (Fricker, 2007). Turning their experiences into data, then, is a way for participants to assert their credibility and claim authority when other forms of reporting and storytelling have failed.
Before exploring the wider context of the economy of believability that underpins this perceived binary, I will first address how this imaginary is framed. My interviews often began with a general prompt about participants’ motivations for sharing on She's A Crowd. One of the recurring responses I heard is an appreciation for the standardized storytelling process afforded by She's A Crowd (the 10 step process described above). Although my participants at times expressed reservations about the potential simplification of their experiences, overall, they told me that they loved the process, as it required minimal labor from them. For example, May, a postgraduate student and an advocate for women's rights, expressed her appreciation for the simple storytelling process, describing that it allowed her to just simply “tick the box,” “pin it on the map,” and “skip the parts” she was not comfortable sharing. In other words, participants appreciated the ease with which they could share their experiences, which was enabled by the multiple-choice format that organized their experiences into distinct, orderly segments. For Sneha, another postgraduate student, these discrete “points of information,” such as the incident's time, location, and whether she knew the perpetrators, while being simple and straightforward, were sufficient to encapsulate the essence of any experience of gendered violence. In Sneha's words: If they’re going to make me do that, there's no way I’m going to type out long paragraphs about my story of sexual assault. So then as I started filling out a questionnaire which is basically like an MCQ (multiple choice question) style, I understood how easy it was to file your story and how there is always like, a few points of information that are always pertinent like time of day, the location, whether you knew the person or not. […] So, it makes it easy to file, and like almost every part of my experience was documented there.
For participants, the breaking up of experiences into data points is beneficial, not only because it tidies up and organizes their experiences in a way that is more conducive to sharing, but also because their experiences could then be categorized, analyzed, and aggregated, which, as Sneha suggested, “gives them strength in number.” In Raw Data is an Oxymoron, Lisa Gitelman and Virginia Jackson (2013: 8) reflect on the popular conception of data as “particulate” or “corpuscular, like sand or succotash”: that is, “data exist in little bits.” We see similar conception of data as discrete bits in Sneha's and others’ accounts (in phrases such as “a few points of information” or “tick the boxes”). Importantly, as Gitelman and Jackson (2013: 8) argue, the perceived “particulate” nature leads to another perception, that data are “aggregative”—a characteristic that imbues data with power. This was exactly what Sneha later reflected, as she explicitly suggested the binary between “data” and “story”: “story can only be anecdotal, it can never account for larger narrative change.” Sharing experiences in nondata formats will only ever amount to “anecdotes,” which cannot be accumulative and thus, cannot be impactful. Data, by contrast, are perceived as being accumulate-able, thereby carrying more weight.
Indeed, in my interviews, it was common to encounter the term “anecdote” used to describe the recounting of experiences in nondata forms. The historical usage of the term anecdote refers to something private and unpublished, but now carries the connotations of lacking scientific rigor. On writing about different perceptions of evidence, Sara Ahmed (2016) succinctly captures this discursive construction of the term “anecdote,” noting how “anecdotal” is often used to suggest a deficiency in the evidence—that is, anecdote is “a series of first-hand impressions” which might be “distinguishable from the evidence generated by systematic research or provided by an expert.” I also notice this discursive construction of anecdote as lacking rigor, as compared to data, being echoed everywhere. For example, when I put the query “anecdote vs. data” on Google, the first result I encountered was the introduction to The University of California, Los Angeles' (UCLA) online Data Science Curriculum, where a stark distinction is drawn: “Data beats anecdotes. […] Anecdotes can contain personal bias, might be carefully selected to represent a particular point of view, and, in general, may be completely different from the general trend” (UCLA School of Education and Information Studies, n.d). Turning their stories into data for it to be aggregated with other datafied stories, in short, is a strategy to sidestep the characterization of their accounts as being merely “anecdotal.” Isabella summed up this binary succinctly as follows: “stories are not data” (when she addressed my question about her preference for sharing on She's A Crowd over social media). This response is in keeping with Neff and Fiore-Gartland's findings (2017), who, in their study of data valence within health and wellness communities, highlight that the perceived “truthiness” of data is not necessarily based on any consideration into the validity of data, but rather on an affective sense of truth ascribed to data in general. In other words, data is something that just “feels” truer.
This affective truthiness and objectivity of data emerges as a recurring theme in participants’ remarks. Interestingly, however, some participants offered justifications beyond data simply “feeling truer.” Alice, a legal assistant who experienced image-based abuse during her secondary school years and subsequently had to move to a new city to move on from the incident, emphasized the importance of sharing her experience as data because this allowed her to remain anonymous. At the time of the incident, Alice shared it with her school counsellor, only to be dismissed as experiencing “mere bullying.” This experience of dismissal, disbelief and backlash made her hesitant to share her story afterwards to others, including to her parents. “They [the counsellor] don’t believe me,” she lamented, “why would anyone?” Comparing this experience of being dismissed with sharing on She's A Crowd, she said, “I could just put it out there, and it would be out there, and no one would comment on it.” Sharing her experience as data, then, became a way to detach her identity from her experience, the validity of which then became less susceptible to scrutiny. Similarly, another participant, Charan, a school administrator, echoed this sentiment, citing men's online vitriol for her hesitance to share her experience publicly on social media—something she did not have to encounter when sharing on She's A Crowd.
The power of data as being more objective is particularly valorized in an algorithmized economy of attention, where certain messages and practices tend to be circulated more and become more visible than others (Banet-Weiser, 2018). Precisely due to this “algorithimized logic” of hashtag activism (Etter and Albu, 2021), Sneha shared with me that she decided to share her story on She's A Crowd. For her, social media tend to selectively elevate certain stories based on factors such as the involvement of public figures or sensational details—a shortcoming that data activism supposedly overcomes: Hashtag activism has been a little more haphazard, like it has been more scattered across platforms. And so only certain stories which get viral gain prominence. But when your story is actually on She's A Crowd, the merit of your story is not based on the likes and retweets against it. It was just based on objectivity. [For hashtag activism], some stories more than others rose to prominence, because either they involve the public figures, or some sensational details, like, eye catching details […] But on She's A Crowd crowdsourcing platform, every story is treated with equal merit, in a sense.
Social media not only confers visibility, but also bestows legitimacy. Sarah Banet-Weiser and Kathryn Claire Higgins (2023) capture the power of social media as the contemporary dominant site for the determination of “believable” evidence and the performance of “believable” subjecthood. The (digital) #MeToo movement, for them, is a case in point, as affordances of social media shape who is believed, what kinds of stories get elevated and how survivors must avail themselves of social media logics of storytelling in order to establish credibility. Mapping onto these challenges, counter data platforms like She's A Crowd offer a new outlet for participants to disrupt the “economy of believability” configured by social media. Every story, regardless of the identities of the storytellers and the details of their stories, is conferred equal value and recognition. Less “believability labor” is required: instead of trying to deliver what is considered truthful based on the “going public” conventions dictated by social media, participants can just “tick” and “skip.” Finally, the massive number of stories on She's A Crowd redefines the parameters of what is deemed believable as it turns unbelievable individual anecdotes into collective believable ones. In the words of Katherine: “[…] by sharing with them [She's A Crowd], I was being not only believed, it was data to let people know, this is a problem.” For Katherine, beyond validating the factual occurrence of her individual experiences, stories aggregated as data on She's A Crowd serve to highlight the systemic nature of these experiences, thereby conferring legitimacy back to individual accounts. Other narrative forms are implicitly or explicitly disavowed in favor of datafied storytelling, I argue, because of the historical struggle of women for “believability,” or as Miranda Fricker (2007) would say, for “epistemic justice.”
The limitation imaginary: Data has limitations, so we need more data
Yet, even when data is divorced from the personal and the subjective in order to be perceived as a “superior” form of storytelling, reservations about the potential falsity, authenticity, and quality of the data still linger. During the interviews, I asked my participants what they thought would be the limitations of counter data efforts like She's A Crowd. The routine concern I heard was that there was never enough data. Sneha expressed this sentiment by stating, “I’m just like a drop in the ocean type of thing. Like, I wish I can get more people to contribute to their [She's A Crowd's] dataset.” She continued to ponder whether the data collected by the platform would ever be enough in terms of sample size. She raised questions about geographical diversity, age diversity, and range of experiences within the database, emphasizing that true diversity can only be achieved through the participation of more individuals sharing their experiences. Like Sneha, Katherine expressed concern with a lack of data, attributing it to a lack of funding which impedes awareness about the initiative—people do not participate simply because they are unaware of it. Nonetheless, she remains hopeful that if She's A Crowd “can have more funding in, they can get the word out more.” Rather than worrying about a lack of data in general, Sheetal expressed her concern about the lack of “high-quality data.” As a researcher and a women's rights advocate, she reflected on her own practices with relation to collecting data: I think you can always also have too much data. Just like when you do a literature review, you can see there's just so much information out there and so much data, maybe it just adds on certain themes, certain topics, it's just too much. And you know, it can confuse decision makers, researchers, people, experts in this area. I think we need good, high-quality data, there's just maybe too much collection of various information. I wonder how seriously the data is taken. Given that it's purely personal experience. It's like it's purely anecdotal data. […] I wonder if they [the policymakers] take the data, seriously enough, given that they can’t go back and visit those people who’ve given the data?
This statement is remarkable because of the pronounced double standard it evoked: while data (in general, without a clear ownership) readily acquires legitimacy, data crowdsourced from women are relegated to the dubious category of “anecdotal data.” Isabella said this with a great sense of reflexivity, emphasizing that she herself did not doubt the anonymous accounts featured on She's A Crowd, but rather worrying about the perception of others, policymakers in particular, about such dataset. Isabella's proposed solution, once again, involves the accumulation of more data: “the most impactful way for anecdotal data to be taken seriously, is when you just have a massive number.” Towards the end of our conversation, Isabella raised the notion of a “tipping point”—a point where a specific number of datafied stories is reached, leading to survivors’ accounts being taken more seriously. She said “I think it's almost like there's a tipping point, right? It makes me think of court cases where, you know, one victim survivor comes forward, and they are dismissed. But if 10 come forward against the same accuser, or speak about the same thing happening, then suddenly, that's more difficult for people to ignore.” That tipping point, for Isabella, can be realized on counter data platforms like She's A Crowd. She said, “there are so many entries on She's A Crowd. They have the numbers to be like [to convince people to think]: I don’t think this many people got together and decided to lie.” If the imaginary of binary privileges data as superior epistemic solutions, here, the imaginary of limitation reinforces more data collection as the only solution imaginable.
Repeatedly, these accounts reflect a “more data is better data” mindset; and if the data failed to do what we want them to, whether due to being insufficient, lacking diversity, or anecdotal/unverifiable, this indicates a need for a more comprehensive dataset. This rhetoric is eerily familiar. As Loukissas and Pollock (2017) observed in the wake of the 2016 US presidential election: the disparity between the actual results and the preelection statistical forecasts, surprisingly, did not diminish public trust in data. Rather, the appeal of data remained intact, as Dourish and Gómez Cruz (2018: 8) noted, “if the data did not anticipate the results, then we must simply not have had enough of it, or not have had the right data.” In the words of critical data scholar Noopur Raval (2021), data exists “in a state of permanent iteration, hopefulness and potentiality where the only solution to its shortcomings is ‘to get more data, to know more.’” But the elusive nature of the tipping point raises the question of who is going to decide how much data is enough, and for whom? As D’Ignazio and Klein (2020) argue, collecting more data could become an endless loop and that any data-based evidence could be dismissed because of not being clean enough or big enough, which could set the stage for an endless quest for data.
Further, the privileging of data as a superior epistemic solution overlooks the reality that it is not enough to collect data from the ground up; we need to critically engage with and scrutinize the knowledge practices associated with datafication. In simple terms, it is not just the question of who has the power to decide how much data is enough, but also who determines how these data are being put to use. In an Instagram post, She's A Crowd (2023, August 14) emphasizes the importance of survivors sharing their stories, writing “stories allow subjects to become protagonists, authors, scriptwriters, sculptors, weavers, elders, and puppeteers. A subject can become the center of her story, when she is the author.” While I agree with the fundamental sentiment expressed here, it is important to point out that these stories, which become data points, are then subject to an interpretive frame that ultimately lies beyond the control of those who initially share them. For example, a staff member at She's A Crowd shared with me that one of their clients, a Transport Department, approaches their dataset from a public safety perspective—meaning the Department uses data provided by She's A Crowd to inform physical changes such as more lighting and CCTV cameras. This approach to women's safety belongs to a paradigm called Crime Prevention Through Environmental Design (CPTED), an approach that, in essence, aims to reduce situational opportunities for violence through infrastructure improvements (Hälterlein, 2021). There have been established critiques from various feminist geographers regarding the simplistic and problematic promise of “designing away” violence that underpins this the CPTED approach. For example, Hille Koskela and Rachel Pain (2000) have criticized this approach for mechanistically locating gendered violence within the built environment rather than considering them in relation to sociopolitical structures such as gender, class, and race, which traverse space. Nonetheless, this popular practice still persists (Kern, 2020). This anecdote is illustrative of how grassroots data are being subsumed within a problematic knowledge practice.
I point these out, not to suggest that the data imaginaries held by participants are wrong or misleading, but rather to highlight their fragilities and the power relations at work. In the same interview mentioned above, the staff member from She's A Crowd also acknowledged that it was beyond their control with regards to how data are used by the stakeholders they work with. Here, I argue that the transformative potential attributed to grassroots data remains tempered by the broader logics of an environmental crime prevention approach, shifting the critical impetus away from the root cause which, as Bevan (2023: 14) argues, is the “patriarchal ideology and its embeddedness in lived experience and institutions of power,” rather than, say, a lack of lighting or surveillance. As Rebecca Tuvel (2015) astutely argues in her critique of gender mainstreaming in climate change negotiations, there can be epistemic injustices related to the projects that seek to include and incorporate women's knowledge. This is especially true, she says, when prevailing knowledge practices treat women merely as objects of knowledge formation rather than as creators of knowledge with valuable epistemic resources (Tuvel, 2015). If women cannot decide how their data will be put to use, then collecting more data, as Tuvel would argue, will just amount to more objectification of women's knowledge. In the same vein, in his reflection on the challenges posed by the Big Data era, Mark Andrejevic (2013: 165) argues that the critical task for us is “not simply to reimagine infrastructural arrangements, but also the knowledge practices with which they are associated.” While Andrejevic was concerned with how the knowledge practices associated with Big Data are often controlled by narrow sets of corporate actors, it does not seem farfetched to extend this critique into the realm of seemingly empowering counter data practices. To borrow from Spivak (1988), we must ask if the subaltern can really speak if their speech is too readily embedded within a hegemonic system of knowing.
Conclusion
This paper has shown how She's A Crowd emerged as a new genre of datafied storytelling, and how this has been embraced by everyday people through an imaginary of data as having superior epistemic value in comparison to other forms of storytelling. Significantly, I have also shown that while participants acknowledged the many limitations of data, they paradoxically advocated for increased data collection as the ultimate solution to address these perceived limitations. As I have argued, these imaginaries are not misguided. Instead, they are rooted in the historical struggle for believability and epistemic justice of marginalized groups. At stake, however, is how these imaginaries normatively locate the legitimacy of marginalized experiences within the abstracted and disembodied dataset, obscuring how dominant knowledge practices continue to temper the transformative potential of grassroots data projects. More broadly, this case study has demonstrated a need to rethink any simple characterization that seeks to diametrically oppose “oppressive Big Data regime” versus “empowering data activism,” where the former is associated with Big Data corporate practices and the latter with grassroots actors. Instead, the paper directs our attention to the conditions under which knowledge and circumstances about marginalized people are sought and utilized.
To observe the fragilities of counter data efforts is not to dismiss their potential. Nor is it to suggest that qualitative forms of narratives are inherently more ethical or more suitable to a feminist project. This paper itself, after all, is an endeavor to “source data” from everyday people, which positions me fully within, rather than outside the critique I have presented. What I hope I have made clear, however, is that data, just like other nondata forms of narrative, is “a social interaction—actual or imagined or anticipated or remembered—in which what gets told is shaped by the (perceived) interests of the listeners, by what the listeners want to know and also by what they cannot or will not hear” (Brison, 2002: 102). What is important, then, is not to determine which one is better, data versus nondata forms of narrating (or more broadly quantitative versus qualitative methods of producing knowledge). Ultimately, a feminist politics would ask how this distinction is made, who is being able to make it, and how resistance can be mounted (Bassett et al., 2020). Hence, if policymakers decide to interpret and use these data in ways that reinforce hegemonic practices, as in the case of the situational and environmental crime prevention mentioned above, the question then becomes: how can we challenge and counter that perspective? This last concern is where I see the potential for future research and the practice of feminist data activism to make a constructive impact.
Footnotes
Acknowledgments
I am deeply indebted to my research participants, who generously shared their experiences of gendered violence and their use of She's A Crowd to narrate these experiences. I extend my appreciation to She's A Crowd for supporting my research, dedicating time to answer my inquiries, reviewing interview notes, and facilitating connections with two additional participants. Special thanks to Thao Phan, Mark Andrejevic, and Verity Trott for their thorough reviews of earlier drafts, along with their valuable comments and suggestions. I also want to acknowledge my colleagues at the Gender and Media Lab, Monash University, for their generous engagement with this work during the presentation of preliminary findings at our monthly meeting.
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethics approval statements
The interviews were approved by Monash University Human Research Ethics Committee (project ID: 32060) on 7 November 2023.
Funding
The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Australian Government through the following Australian Research Council grants: DP200100189 and CE2000100005.
Research data
The participants of this study did not give written consent for their data to be shared publicly, so due to the sensitive nature of the research, supporting data is not available.
