Abstract
While research has been conducted with and in marginalized or vulnerable groups, explicit guidelines and best practices centering on specific communities are nascent. An excellent case study to engage within this aspect of research is Black Twitter. This research project considers the history of research with Black communities, combined with empirical work that explores how people who engage with Black Twitter think about research and researchers in order to suggest potential good practices and what researchers should know when studying Black Twitter or other digital traces from marginalized or vulnerable online communities. From our interviews, we gleaned that Black Twitter users feel differently about their content contributing to a research study depending on, for example, the type of content and the positionality of the researcher. Much of the advice participants shared for researchers involved an encouragement to cultivate cultural competency, get to know the community before researching it, and conduct research transparently. Aiming to improve the experience of research for both Black Twitter and researchers, this project is a stepping stone toward future work that further establishes and expands user perceptions of research ethics for online communities composed of vulnerable populations.
Introduction
Traditional human subjects research guidelines encourage researchers to take special care when studying vulnerable or marginalized groups. Consent practices are often overseen by an institutional review board (IRB) or other ethics review bodies. However, studies of public data like social media typically do not fall under the scope of human subjects research (Fiesler & Proferes, 2018; Vitak et al., 2017). This type of research also frequently includes populations that might be vulnerable, marginalized, or stigmatized—such as communities that convene around health conditions, sexual orientation, or race. In this article, we consider one example of such a population, Black Twitter, as a means of thinking through ethical guidelines for researchers.
Researchers have begun to use data from Black Twitter—a digital community within Twitter—rigorously in various disciplines (Graham & Smith, 2016; Jones, 2015; Klassen et al., 2021, 2022; A. Williams & Gonlin, 2017). Because Black Twitter is composed mainly of people who identify as Black, we find a potentially vulnerable population being observed and remarked upon. Shilton et al. call for such qualitative and ethnographic approaches to understanding communities before conducting research with digital data, but what best practices can researchers rely on for research with public data from vulnerable online communities (Shilton et al., 2021)?
To answer this, our research project consists of 18 semi-structured interviews within which we ask self-identified Twitter users familiar with Black Twitter about their understanding of and concerns about research done in relation to Black Twitter. Our study builds on research ethics work done about both vulnerable populations and research conducted on Twitter (Dym & Fiesler, 2020a; Fiesler & Proferes, 2018). We also consider our findings in the context of the history of research within the Black community, as we address these questions:
What do Black Twitter users think about their content being used in research?
Which researchers do Black Twitter users feel more comfortable engaging with for public data research?
What do Black Twitter users think researchers should know when researching Black Twitter?
We garnered insights on how Black Twitter users prefer to engage with researchers, how they prefer their data to be used in research, and what advice they have for researchers. From our interviews, we gleaned that Black Twitter users feel differently about contributing to a research study depending on, for example, the type of content and the positionality of the researcher. Much of the advice participants shared for researchers involved an encouragement to cultivate cultural competency, get to know the community before researching it, and conduct research transparently.
Background
Research and Black Communities
In the 2017 Black Mirror episode “Black Museum,” a Black character is experimented on while incarcerated, and museum visitors force a digital version of him to suffer for their entertainment (Brooker & McCarthy, 2017). In the third episode of the 2020 HBO series Lovecraft Country, a local scientist horrifically experimented on a group of Black people who, after death, haunted his home (Green & Sackheim, 2020). These stories of how the advancement of science, including technology, dehumanizes and degrades the lives of marginalized people are not just found in the imaginations of science fiction writers. In real life, various marginalized groups have suffered for the pursuit of scientific research (Angell, 1990; Hodge, 2012), including the Black community (Washington, 2006). The following three examples highlight the ramifications of this history including valid reasons for hesitancy and mistrust of researchers among the Black community today.
Real-world scientific suffering inspired the examples of research-related trauma in the science fiction noted above. The origin of gynecology in the United States during the epoch of chattel slavery, for example, reveals a sordid history of the abuse of Black women for the benefit of White women and the White male doctors who obtained recognition and esteem for their atrocities. One such doctor was Dr James Marion Sims, who conducted gynecological experiments on enslaved Black women (Cooper, 2017). Cooper points out that there was no way for these enslaved Black women to give consent as their masters released them to the experiments.
In Rebecca Skloot’s nonfiction book, The Immortal Life of Henrietta Lacks, the story of a Black woman and her incredible cells unfolds to reveal the financial gains and immoral behavior of an entire industry. Henrietta “Henny” Lacks, born in Virginia in 1920, was a mother and grandmother living in Maryland in early 1951 when she went to the only hospital in her area that would treat Black patients (Johns Hopkins) for a “knot” in her womb (Skloot, 2010, p. 13). After a biopsy was taken, she was diagnosed with cervical cancer and passed away later that same year. Without Lacks’ knowledge or permission, the biopsy was given to a lab where it continued to grow. Physician techs at the lab soon discovered that her biopsy yielded immortal human cells. What began as a lab sharing the cells with other labs turned into a multi-million-dollar industry as the cells were sold and used in research as “HeLa cells” across disciplines of scientific research unbeknownst to her family for decades (Skloot, 2010, p. 37). Lacks’ descendants are now suing a large pharmaceutical company that sells Lacks’ cells for a large profit each year (MSNBC, 2021).
Black women have not been the only ones to suffer at the hands of science. A more well-known unethical example of scientific research involved Black men and a disease that was inflicted on them even with a known cure. Initially called the “Tuskegee Study of Untreated Syphilis in the Negro Male,” it is now known as the “USPHS Syphilis Study at Tuskegee.” According to the Centers for Disease Control and Prevention (CDC), the experiment included 600 Black men, 399 of whom had syphilis and 201 who did not, and lasted from 1932 to 1972. While treatment for syphilis (penicillin) had been discovered in 1943 and was widely available, researchers did not offer any to the Black men in the experiment suffering from syphilis (Centers for Disease Control and Prevention (CDC), 2021). The men of the study were offered complimentary medical examinations, meals, and burial insurance, but after the study was made public, a lawsuit was filed that afforded survivors, spouses, and children to receive medical and eventually health benefits.
Looking at the contemporary medical and healthcare landscape for the Black community, institutional racism remains pervasive. Consider our current moment concerning the COVID-19 pandemic. Medical equipment heavily relied on during the pandemic like the pulse oximeter, which estimates oxygen saturation of blood by shining two wavelengths (red and infrared light) on the skin and measuring the difference in light absorbance between the two wavelengths, possesses biases that affect many in the Black community. For Black patients, the pulse oximeter is 2.5 times less accurate likely due to skin pigmentation and the fact that the algorithm used to calibrate the device mainly included data from White patients (Tobin & Jubran, 2022).
In addition, the virus disproportionately affects Black people, who are hospitalized or die at rates 2.7 times higher than White people in the United States—not because of physiological reasons, but due to a system of structural racism steeped in White supremacy, thereby creating a pandemic within a pandemic (Laurencin & McClinton, 2020). Moreover, there is understandable vaccine hesitancy in the Black community (Laurencin, 2021). Increasing the number of Black people in medicine, from doctors and dentists to nurses and pharmacists, can go a long way to building trust between the Black community and the healthcare industry (Laurencin, 2021). Granted, to do so will take years. Awareness and meaningful action now are of the utmost importance.
The examples shared here only speak to medical research gone wrong in the Black community—a vulnerable population. As Hampton points out, the long history of abuse and misconduct in science against various people continues in artificial intelligence (AI) (Hampton, 2021; Turner et al., 2021). We propose that these histories also extend across the tech industry. Take the Kodak and Shirley cards from the 1950s to the 1990s. These cards neglected color correction for darker skin tones for decades until manufacturers of brown products like chocolate and furniture demanded better (Benjamin, 2019). Course correcting after the damage has already been done seems to be the rule instead of the exception. The Tuskegee experiment and other atrocities in research led to the creation of reactionary human subjects research standards (Bruckman, 2014). What will it take for public data research to receive the same guidelines and best practices? Without careful consideration, technology research using public data may contribute to such harm—if they have not already. Moving from these examples, we can examine the research conducted about Black Twitter to establish a baseline for our case study and research project.
Black Twitter
When André Brock first wrote about Black Twitter in 2012, he defined it as “a user-generated source of culturally relevant online content, combining social network elements and broadcast principles to share information” (Brock, 2012, p. 530). Later in 2020’s Distributed Blackness: African American Cybercultures, Brock (2020) further expounded upon the previous definition: Black Twitter is an online gathering (not quite a community) of Twitter users who identify as Black and employ Twitter features to perform Black discourses, share Black cultural commonplaces, and build social affinities. While there are a number of non-Black and people of color Twitter users who have been “invited to the cookout,” so to speak, participating in Black Twitter requires a deep knowledge of Black culture, commonplaces, and digital practices. (p. 81)
In addition to research about Black Twitter, scholars also use Black Twitter data specifically to (1) use an existing theoretical framework to better understand the Black Twitter community, (2) examine how the Black Twitter community responds to different forms of media or events, or (3) investigate the linguistics of Black Twitter. In the first category, Graham and Smith approached their study from within the context of sociology and compared sets of hashtags to establish Black Twitter as a counter-public (Graham & Smith, 2016). In Urban Education, Hill further expounds upon Graham and Smith’s earlier work to include pedagogies of resistance using hashtags such as #SayHerName and #BlackLivesMatter (Hill, 2018). Lavan is another scholar who researched Black Twitter with an existing theoretical framework. Lavan compared Black Twitter to a social and political watchdog, engaging directly with “signifyin’” hashtags like #PaulasBestDishes and the more activist-focused hashtag #IfTheyGunnedMeDown (Lavan, 2015, p. 58). According to Florini (2014), signifyin’ encompasses “Black oral traditions . . . Black cultural practices . . . Black subjectivities, and . . . shared knowledge and experiences” (p. 224). Scholars in this category overwhelmingly isolated specific hashtags, pulled direct tweets, or described the content of tweets. In some cases, the scholars included screenshots of tweets.
The second category of scholars focused mainly on Black Twitter and media. The first set of scholars in this category, Williams and Gonlin, considered Black Twitter alongside the television series How to Get Away with Murder and discourse on Black womanhood. The two authors described Black Twitter as a space for signifyin’ and Black people commiserating on their shared experiences. For their method, Williams and Gonlin relied on a social media analytics company, Crimson Hexagon (CH), to search Twitter for tweets that contained specific words or phrases (“HTGAWM,” “htgawm,” or “how to get away with murder”) resulting in a yield of “8,017,877 tweets over a 25-month period (1 September 2014 to 15 October 2016)” (A. Williams & Gonlin, 2017, p. 989). The researchers had access to usernames, geotagged locations, and other metadata for the tweets. CH also provided demographic estimates for the tweets, including gender and race for people whose Twitter account profiles included a picture of a person. The researchers narrowed their sample by searching for terms relevant to their focus (“wig,” “hair,” “mom,” and “comb”), randomly created a subset, and manually looked up Twitter profiles to hand code for racial identity. Screenshots of tweets representative of specific themes were used. However, “names and avatars [were] removed as per institutional IRB guidelines” (A. Williams & Gonlin, 2017, p. 991).
While Williams and Gonlin focused on entertainment media, Lee’s research publication exposed how Black Twitter revealed bias in mainstream media. Lee manually searched the Twitter platform for six hashtags (#APHeadlines, #IfTheyGunnedMeDown, #DangerousBlackKids, #ICantBreathe, #CrimingWhileWhite, and #AliveWhileBlack) popularized after the killings of six Black people (Mike Brown, Eric Garner, Jordan Davis, Trayvon Martin, Tamir Rice, and Renisha McBride) (Lee, 2017). Lee’s research publication featured Twitter usernames and verbatim tweets.
While the second set of scholars explain how Black Twitter reacts to and against different forms of media, the third group of scholars focused on similar “Blacktags”‘ (“racialized hashtags,” Sharma, 2013, p. 46) in their analysis through hashtags such as #IfTheyGunnedMeDown and #AliveWhileBlack to study how identity is shaped and performed within Black Twitter. The linguistic focus of the scholars in this third category of inquiry saw discourse within Black Twitter aligning with African American Vernacular English (AAVE) and associating political power and collective action around the language being used. The first scholar who examined the linguistics of Black Twitter, Sharma, personally collected tweets in 2011 (but does not specify if this was done through the Twitter platform directly or with a third-party application) and quoted them verbatim without including a Twitter handle.
The second scholar in this category, Florini, connected discourse on Black Twitter with the AAVE cultural tradition of signifyin’. Florini (2014) began collecting what they referred to as “‘Black Twitter’ timelines” in mid-2009 and claimed that they “used avatars and profile information to ascertain the racial identity of the participants as much as possible” (p. 226). In the publication, Florini quotes tweets verbatim with the Twitter handle and date included. The final scholar in this third category, Schiappa (2014), outlined the diverse ways in which Black Twitter is a unique Black community practice. Their research included quotes of tweets verbatim with the associated Twitter handle as well as screenshots of unedited tweets that include the tweet content, Twitter handle, likes, and timestamp.
In both the second and third categories of scholars, tweets could include Twitter handles and were often included verbatim or as a screenshot modified to exclude the Twitter handle. Across all three categories of scholars, research has been conducted on Black Twitter through analyzing tweets and Blacktags, typically with tweets quoted verbatim or included in screenshots with respective Twitter handles included. These choices speak to broader conversations about best practices for ethical research on Twitter specifically, and with public data more broadly. Recent research from Mason and Singh (2022) analyzed 136 education manuscripts that quoted 2,667 tweets to find that discoverability of at least one tweet in a manuscript occurred in 89% of manuscripts with an average of 55% of tweets discoverable per manuscript across various forms of tweet reporting (quoting, quoting verbatim, screenshots with or without usernames included or obfuscated, hyperlinks, paraphrase, excerpt, etc.). Mason and Singh assert that direct quotes of tweets from vulnerable populations may not be acceptable in certain situations. Ultimately, whatever considerations researchers make for how to handle tweets requires explanation and justification in the publication in an effort to move the entire research community forward in its ethical conduct with public data. The aforementioned scholars rarely shared their considerations, but how could they know to do so? We have not, yet, arrived at a polished, holistic, and cohesive ethical understanding of best practices and research standards for public data, but there have been strides. In the next section, we will discuss the origins of research ethics with human subjects research and where the topic rests today in regard to public data.
Research Ethics and Public Data
As previously mentioned, research using public data does not have well-established ethical guidelines in the same manner as human subjects research does (Shilton et al., 2021). However, human subjects research can provide a useful framework for conducting research using public data. In an overview of research ethics in human–computer interaction (HCI), Amy Bruckman describes traditional human subjects ethical research in any field as “balancing the potential benefits with the potential harms” and outlines three risks that researchers must mitigate with their research (Bruckman, 2014, p. 464). The first risk is harm to the research subjects, the second risk is the possibility of disturbance to the environment the researchers are studying, and the third risk lies with the researcher’s institution, should grave consequences occur due to ethical violations (Bruckman, 2014). We will return to these first two risks (to the subject and environment) in considering possible harm to Black Twitter. However, concerning traditional guidelines for human subjects research, studies of public data such as social media often fall outside the scope due to the types of guidelines or how regulatory bodies define such research (Vitak et al., 2017).
However, social media researchers are exploring what best practices and guidelines might look like as traditional research ethics for human subjects research expands to accommodate the unique and specific demands of public data research. Shilton et al. explore what they refer to as “pervasive data research,” which results in “rich information about people generated through digital interaction and available for computational analysis” from researchers throughout academia, government, and industry (Shilton et al., 2021, p. 1). Here, participants are usually unaware they are participating in research which compromises the level of agency and autonomy for those individuals. Participant awareness, gaining trustworthiness, and recognizing as well as seeking to balance power are key yet underexplored and nascent concepts in public (or private) data research that Shilton et al. (2021) believe need to be established in order to raise this research to the ethical standards of human subjects research.
Although ethics review boards are not often equipped to discern what is needed for public data research, the presence or absence of such regulatory oversight does not do enough to speak to the underlying ethical issues of research. For instance, industry research often functions without formal ethical oversight as companies are more beholden to legal limitations and the court of public opinion. Social media companies themselves may have access to data that users similarly do not realize is being used for research purposes, as we discovered from public backlash to the Facebook “emotional contagion” study in which Facebook altered newsfeeds to test a psychological phenomenon (Hallinan et al., 2020, p. 1076). Although Facebook established new internal ethics review processes after this controversy, and other tech companies have similar mechanisms, this does not address the heart of the problem; ethics should be the inherent responsibility of each researcher as opposed to an external obligation of a group of people who are not always as familiar with the context, conditions, and potential consequences of research as the researchers looking to conduct said research (boyd, 2016). While research in industry is increasingly informed by human-centered and equity-centered design (Nee et al., 2021), the involvement of end users and subject matter experts throughout the inquiry process is a step toward alleviating potential risks and harms associated with industry research.
Researchers have also specifically considered research ethics in the context of Twitter. M. L. Williams et al. (2017) studied the ethics of using Twitter data in social research based on the views of people who use Twitter, online context, and algorithmic estimation. When surveying people who use Twitter, the authors found that 80% of respondents expected a request for consent before the publishing of their content and 90% expected anonymity within said publication (M. L. Williams et al., 2017). Fiesler and Proferes (2018) also wrote explicitly about research ethics for Twitter. Endeavoring to develop ethical norms for research in social computing settings, they argue that user perceptions should be considered to “help inform ethical research practices” (Fiesler & Proferes, 2018, p. 1). Furthermore, they conclude with suggestions for how common ethical heuristic for human subjects research (e.g., beneficence, justice, respect) can apply to research using public data as well. Fiesler and Proferes also cite potential harms resulting from Twitter research. This includes the exposure of personally identifiable user data publicly shared during a crisis to obtain assistance or entire groups of people coming to harm through being researched as part of an online community (Fiesler & Proferes, 2018).
With respect to vulnerable or marginalized populations, there are more guidelines researchers should consider when studying such communities. Dym and Fiesler studied the social norms within transformative online fandom. Dym, Fiesler, and their participants encouraged outside researchers to spend time in the community they are studying before publishing and ask permission before sharing “public” data (Dym & Fiesler, 2020a). Dym and Fiesler discovered what keeps vulnerable members safe, how to protect their privacy, and when social norms break. Important to researchers is the breaking point that occurs when they do not learn community norms when entering and engaging with an online community. This breaking point does not have to be inevitable. What more can researchers do when striving to study vulnerable populations online ethically? In the following section, we will describe how we went about pursuing answers to this question through examining Black Twitter.
Methods
Positionality
The first author identifies as a Black woman who engages with Black Twitter. Her research is informed by Black Feminist Thought and Digital Black Feminism. She focuses on how Black people influence technology as they survive and thrive. The second author is a White woman who researches marginalized communities online, including those built around identities she shares. Both work at a predominantly White institution.
Semi-Structured Interviews
The first author conducted 18 semi-structured interviews from August 2020 to September 2020 (see Table 1). Each interview lasted from 30 to 90 min and participants were compensated with a $20 Amazon gift card. Recruitment of interview participants occurred through tweets about the study using the hashtag #BlackTwitter and a recruitment questionnaire. The questionnaire included the questions “Do you use the social media platform Twitter?” and “Are you familiar with Black Twitter?.” We scheduled interviews with respondents who answered yes to both questions. We also used snowball sampling as a recruitment tool.
Participant Demographics.
We conducted two pilot interviews to better estimate the length of an interview as well as to refine our interview questions. Previous research conducted about Black Twitter as well as Dym and Fiesler’s research inspired the initial questions in our interview protocol (Brock, 2012; Dym & Fiesler, 2020a). Our interview protocol involved questions about discovering Black Twitter, the content seen on Black Twitter, and a series of questions related to outsiders, research/researchers, and ethical dilemmas within Black Twitter. When asking participants about research on Black Twitter, we did not provide specific examples of research. However, some participants were familiar with dissertations or other forms of research published about Black Twitter.
The age range of participants was 24–53, and apart from one participant who identified themself as “Chinese/Asian-American,” the remaining 17 participants identified themselves as Black, with some participants distinguishing themselves as African American, Black British, Black African, Nigerian Black, or Kenyan Black. All but one of our 18 participants self-identified as a woman. Our research team recognizes how this skews the gender composition of our study. As reported by Pew Research in 2019, 48% of people on Twitter identified as women (Wojcik & Hughes, 2019). However, this statistic does increase when considering that 65% of the most prolific tweeters, 10% of the total Twitter population who create 80% of tweets, identify as women (Hughes & Wojcik, 2019). The available data on Twitter demographics is not granular enough to represent Black women, Black men, and Black non-binary people. Regardless, the majority of our participants identifying as women may impact the overall representativeness of our findings.
Directed by Saldaña’s (2013) suggestions for qualitative analysis, we openly coded the transcripts. From the open codes, general themes surfaced through discussion. After an additional conversation about the general themes, two findings emerged involving attitudes toward research and seeing researchers as outsiders. These insights could also speak to research conducted with other vulnerable or marginalized communities.
Findings
Public Data Research, Researchers, and Black Twitter
The first finding involved opinions about research on Black Twitter. About one-third of the participants had not considered that there were academic papers written about Black Twitter, while 10 of the 18 participants knew about this possibility. We also asked participants about specific content from Black Twitter appearing in papers. Whether or not someone is willing to be quoted in an academic paper comes down to the specific tweet in question, context around the tweet, whether their consent was acquired, or the positionality of the author of the academic paper to Black Twitter. P2 spoke to several sentiments that many of the participants shared: “. . . if someone is doing qualitative research where it’s a conversation like this, and I can give more context to the things that were shared in the past, versus if someone wanted API access to my account . . . I wouldn’t want someone to just take those tweets and then do whatever analysis they want, because over the course of 10 years on Twitter, my opinions have changed. So, I think qualitative studies, even with the quantitative aspects, if you were doing a mixed methods study where it was part qualitative and then you were also farming some tweets with consent, that wouldn’t be awful to me. I think that would be okay as long as all parties were informed and okay with it.” (P2)
What this and other participants shared about context and consent track to previous work on attitudes toward Twitter research. The variance of these attitudes depends on factors like the content of the tweet or how the analysis is being conducted. Also, being informed may be more critical than being consented (Fiesler & Proferes, 2018).
Several of those interviewed also understood that their content is public unless their account is private. Therefore, they would understand whether academic papers quoted their Twitter content. When it comes to whether or not a participant wanted to be quoted in a paper, only P1, P5, P6, and P14 responded “yes” without stipulations. Besides P7 who felt nervous and concerned about the possible context of a tweet being quoted, the remaining 13 participants would be willing to have their tweet quoted in an academic paper depending on context of the tweet (P9), consent/permission (P2, P3, P4, P18), the researcher’s identity as well as context and consent (P10), the content of the tweet (P8, P12), if the researcher is an insider of Black Twitter (P16), if the participant can select the tweet (P11), if the researcher gives a heads up (P15), or if the tweet is already public (P13, P17). Our participants overwhelmingly would not want their username quoted in an academic paper. In asking these questions of a different marginalized group, mostly LGBTQ+ people in online fandom, Dym and Fiesler (2020a) saw a similar tension between privacy and attribution and suggested that alleviating the tension is very context-dependent.
Researchers as Outsiders
Another finding Dym and Fiesler (2020a) identified that our interviews also surfaced is the concept of researchers as outsiders. As Bruckman (2014) alluded to, part of research ethics for human subjects research involves considering the risk of the environment the researchers are studying being disturbed. In the case of Black Twitter, this disturbance can occur when researchers are seen as outsiders. Multiple responses were provided when participants were asked to define an outsider to Black Twitter. Six of the 18 participants explicitly defined an outsider to Black Twitter as someone who is not Black. As one participant said, “if you’re not Black, you’re visiting Black Twitter” (P9).
The overall responses, however, were nuanced. Throughout their interviews, participants defined an outsider as someone who does not engage with, understand, or relate to Black culture. While many participants expressed that not being Black would not automatically make someone an outsider, two participants who self-identified as Black also considered themselves outsiders. As one of those two participants put it, “I feel like I’m an outsider. I mean, I’m a Black intellectual. I study Black women. But, I’m not willing to get in there and engage with Black Twitter the way that Black Twitter probably expects me to do. Right? Like you supposed to be there supporting me or supporting us. Like, yeah, I like something I’ll retweet it, but I’m not really out here distributing the information that is required.” (P5)
Two other participants stipulated that outsiders to Black Twitter could be White people who “want to come in and drop their opinions” (P18) or people who are not “lending their privilege to the plight of Black people” (P17).
Examples of outsiders provided by participants include bots, law enforcement, journalists or media companies, and researchers who are not part of the community. Krafft et al. (2017) outline ethical guidelines for using bots in online research, but what about researchers themselves? Certainly, examples of what not to do abound. P10 specifically brought up one story of a researcher scraping tweets with the #BlackinIvory (or #BlackintheIvory) hashtag then announcing on Black Twitter that they had done so to share the tweets with anyone interested. P10 summated the collective response: “ . . . and everybody was like, ‘this, isn’t your data friend, like who told you to do this?’” (P10). The researcher, labeled as an outsider to the community around the hashtag, exposed the lack of consent and transparency inherent in certain manifestations of public data research.
When discussing their opinions about participating in research depending on the researcher’s positionality, two participants said they would not mind who the researcher was. In contrast, another two said it would depend on the research context or the researcher’s source. However, 14 of the 18 participants said that it would matter to them who the researcher was. Participant one responded to this question thusly, “I think it would be kind of weird to feel like I was being analyzed by White people. That would seem strange” (P1).
Several participants expressed that White people doing research on Black Twitter specifically or on Black people, in general, seemed colonizing. P11 had this to say on the subject: “I don’t mind Black people in the academic world sort of reflecting and researching these experiences because I think, on some levels, it’s a shared experience. And I think those voices need to be heard in those areas. However, when it’s a White person, it just comes off as colonization and maybe that’s harsh, but that’s how I feel.” (P11)
We ended by asking participants what specific advice they would give to researchers looking to study Black Twitter. Nearly every participant had an answer to this question. These pieces of advice for researchers included: Be sensitive about the way you ask questions (P1) and, listen and ask open-ended questions. (P4) Do not have assumptions or expectations for how people within Black Twitter will engage or respond. (P3) Acquire context especially for colloquialisms (P7), but also “let their words be their words.” (P13) Understand cultural competency (P5) and have racial awareness (P18), including taking into consideration the history of Black people and research. (P6) Ask for consent and determine whether or not people want to be research participants. (P10) Being a part of the community does make a difference (P2), but also take into account your own Twitter usage, who you follow, and how much you engage with Black Twitter. (P8, P9) Check your motives and perspective as a researcher (P11), and be transparent. (P11, P15)
Finally, our participants also emphasized that it is important to recognize that Black Twitter is diverse (P1), does not represent all Black people (P12), and is not a monolith (P12, P17). We bear this in mind as well as we next discuss recommendations.
Discussion
Although the victims of unethical medical research in the Black community often did not have the opportunity to share their experiences, stories related to poor public data research in online communities should not go untold. Moreover, the insights gained from our interviews can potentially help other online communities with vulnerable populations. The experiences of our participants represent a small subset of the many people who engage with Black Twitter. However, their vulnerability and honesty shared in their perceptions and lived experiences can guide researchers toward better practices and understanding with the larger population of Black Twitter and other marginalized online communities.
Weigh Attribution Versus Privacy Risks Within Your Context
Returning to Amy Bruckman’s three risks from research in the context of people and technology, our research in the context of history shows how the first two risks manifest on Black Twitter. The first risk of harm to subjects appears in two ways: using content without permission or not properly referencing the originator of content in research. Credit for contributions of Black people that result in financial returns for the appropriators like the lab that took and sold Henrietta Lacks and her immortal cells often takes decades to acquire along with any remittance due to survivors. The appropriation of online content created by Black people is a continuation of the long-standing practice and tradition of appropriation from Black culture promulgated by dominant cultures in many contexts (Greene, 1998–1999; Thompson, 2015). Previous research on Black Twitter found participants lamenting about “culture vultures” like Buzzfeed who repost content to their sites without permission or, in some cases, credit (Klassen et al., 2021, p. 13).
The Black enslaved women who were inhumanely experimented on were not even footnotes in early US gynecological research and Black historians only uncovered their humanity and contributions to science generations later. When it comes to Black Twitter data, even among our small set of participants, there existed a wide variety of desires about attribution. Dym and Fiesler found value tensions in fandom communities between a preference for attribution and a need for privacy, and their recommendation encouraged privacy over attribution (Dym & Fiesler, 2020a). Our participants tended to want researchers to give them the choice as to how their content would be used and in what context. Some people will not want their tweets amplified outside of the platform in an academic publication. Other people would prefer to get credit for their tweet regardless of whether it shows up on Buzzfeed or in your paper (Klassen et al., 2021). Instead of assuming, ask for permission to use tweets verbatim and confirm whether a person wants their username to be included. If the person does not respond, consider paraphrasing the tweet or ethical fabrication instead (Markham, 2012). Researchers can weigh the context and research topic to determine what approach to use with Black Twitter data to best respect the content as it is removed from its intended audience.
Examine Your Status as an Outsider to Your Research Community
The second risk of disturbing an online environment as a reacher involves disrupting conversations or steering comments in specific ways. Yes, to a number of Black Twitter users, some if not many researchers who are not already insiders to Black Twitter are outsiders alongside bots, journalists, law enforcement, and non-Black onlookers. For the Black men recruited to the Tuskegee study, the outsiders to their community (medical researchers) disturbed their community by pretended to be providing medical services for decades which turned out to be a lie. The wives and children of the Black men experienced secondary and tertiary effects of the study and those who were alive after the settlement were able to receive compensation (CDC, 2021). As an online community, Black Twitter is nebulous and operates as an open secret on the Twitter platform, but outsiders can and do pervade the space (Klassen et al., 2021). Infiltration for the sake of research can be jarring and yield unwelcome responses just as, for example, the previously mentioned researcher who scraped tweets from the #BlackintheIvory then messaged users following that hashtag to offer the dataset. Harm to subjects on Black Twitter can manifest in such micro- or macro-aggressions depending on how research is conducted. Cultural insensitivity, incorrect tone, or an overall inability to “read the room” within the online community of Black Twitter can negatively affect subjects.
Researcher Positionality and Reflexivity Are Integral to Public Data Research
Including researcher positionality and reflexivity into research design is a lesson from traditional human subjects research that we do not see as commonly used for public data research. The manifestations of a researcher’s positionality can be fixed (age, gender, race, ethnicity, etc.) or fluid (political beliefs) and are put in relation to three areas of a research endeavor: “(1) the subject under investigation, (2) the research participants, and (3) the research context and process” (Holmes, 2020, p. 2). Both self-reflection and reflexivity are used to help researchers “identify, construct, critique, and articulate their positionality” by “acknowledg[ing] and disclos[ing] their selves in their research, seeking to understand their part in it, or influence on it” (Holmes, 2020, p. 2).
Milner warns of the dangers researchers may face that are “seen, unseen, and unforeseen,” which positionality and reflexivity can address specifically in research affected by or involved with racial or cultural contexts (Milner, 2007, p. 388). Liang et al. bring into relief four tensions in conducting research in HCI with marginalized communities (exploitation, membership, disclosure, and allyship) and point to positionality and reflexivity as ways to identify potential researcher bias and power hierarchies (Liang et al., 2021). While some research projects may go unchanged due to researchers examining their positionality and reflexivity, there is an excellent opportunity for researchers to process how these exercises affect their research design. The perspective gained may yield new or different ways to ask research questions, collect data, perform analysis, or share results. This may culminate in research participants who are more comfortable with the research and sharing their data willingly.
Consider an example from Vakil et al. For a Black researcher in the research group, their social positionality and racial solidarity afforded them research access to a Black community whereas a White colleague was denied (Vakil et al., 2016). By assessing one’s identity in relation to one’s research and with those upon whom research will be conducted, the ability to discern ways to navigate relationships, potential harms, and possible ethical dilemmas within the research design will be more readily accessible. Another story from Ogbonnaya-Ogburu et al.’s (2020) work highlights the importance of having diverse research teams. In story 3, the author discusses a research study focused on unhoused youth. Neither they nor their research partner, who helped conduct the interviews, analysis, and write-up for the project, had experienced homelessness. However, the author identified as Black, and the research partner did not. Two telling instances that occurred over the course of the research project show how diversity aided the process. First, the primarily Black participants shared longer and more detailed responses with the author than with the non-Black researcher. Second, when conducting data analysis, the Black author was able to “recognize and interpret the participants’ more racially coded remarks” that may have “gone unnoticed” had they not been present (Ogbonnaya-Ogburu et al., 2020, p. 6).
As a Researcher, Inform Yourself of the History and Current Cultural Context of Your Research Community
If a researcher does not identify with the identity of their research participants, being able to conduct that research alongside a researcher who does can make for a more fruitful experience for everyone involved. However, as Harrington et al. point out, sharing an identity with the identities of your research community does not automatically yield “a pass” (Harrington et al., 2019, p. 17). There are still gatekeepers and histories of research injustice to contend with, emotional labor required to get past the gate, and decisions to make about recognition and whether members of the research community want to associate with your research practices (e.g., design workshops) (Harrington et al., 2019). Therefore, consider personal cultural competencies and racial awareness, the history of research and the potential community, and positionality. Here, Dym and Fiesler’s prior work again offers guidance for studying vulnerable online communities as an outsider. One participant interviewed bemoaned “parachute journalism,” wherein “people come into the community, they parachute in, they don’t really look around, and they leave” which could “easily disrupt community norms and perpetuate harm” (Dym & Fiesler, 2020b, p. 14). If a researcher wants to draw conclusions about a community without talking to them, that researcher needs to—at the very least—know something about that community.
Conclusion
Just as the atrocities of medical research in the Black community resulted in diminished trust in the healthcare system for generations, unexamined researchers and ill-informed research of public data in marginalized online communities may negatively impact those communities in unforeseeable ways far into the future. When considering whether and how to report tweets in research, consider the context, content, if permission can be acquired, if attribution is desired, and whether the researcher’s positionality and reflexivity have been examined before, during, and after the research design.
Hampton (2021) sums up the takeaways of this work most poignantly: While science is often perceived as “the absolute truth” or “value-free,” scientists’ findings are shaped by their social context, that is, the dominant ideologies (i.e., racism, sexism, heterosexism, etc.) which impact the scientific questions and problems they pursue, the methods they employ, and the conclusions at which they arrive. Consequently, we must critique in a principled manner the production of scientific knowledge, examining researchers’ situated knowledge (i.e., the idea that “knowledge is always produced from a specific disciplinary, ideological, and social location”), their institutions, their funding sources, their methods, the people who will benefit from the research, and the people who will potentially be harmed by the research. (p. 7)
In the above quote, “science” could be exchanged with “technology.” Scholars like Ruha Benjamin and Meredith Broussard write about the illusion of neutrality and infallibility attributed to technology, algorithms, and so on (Benjamin, 2019; Broussard, 2018). The social context that Hampton refers to correlates with a researcher’s positionality and reflexivity as we discussed extensively above. The critique that Hampton (2021) refers to is crucial across all axes, but especially the last one: “the people who will potentially be harmed by the research” (p. 7). This research study focuses on addressing the research ethics for public data in marginalized online communities like Black Twitter. We interviewed 18 participants about their attitudes and perceptions of research, researchers, and their impact on Black Twitter. The participants shared stories, experiences, and perspectives that paint a picture of amenability to including their tweets in academic research mostly under certain conditions. Participants also expressed issues with researcher-outsiders who engage without care for the community’s norms, forethought of how their research will impact the community, or cultural competency. Ultimately, the participants provided advice, suggestions, and recommendations on how to conduct research with their community that we distilled into takeaways for researchers hoping to research Black Twitter or other marginalized online communities.
Do not go into public data research, regardless of your positionality or reflexivity, without doing your due diligence. Through observation or prior research, discover the norms and history of the online community you are interested in. Be sure to examine your positionality and reflexivity throughout the research process. Do not make assumptions or rely on stereotypes about a vulnerable or marginalized community that could obstruct genuine insights or deter opportunities to share authentic experiences. Consider how your research and methods may harm the community represented by the public data with which you engage. Finally, when striving to do no harm, give yourself grace when you fail. Apologize sincerely, learn from your mistakes, and commit to doing better moving forward. When discussing risks and harms to research participants, regardless if it is through human subjects or public data research, what we are really contending with is justice. Hopefully, we can all conduct our research in this manner whether or not it involves public data.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Science Foundation (#1704303). The publication of this article was funded by the University of Colorado Boulder Libraries Open Access Fund.
