Abstract
Frequent public uproar over forms of data science that rely on information about people demonstrates the challenges of defining and demonstrating
Keywords
Introduction
In the era of ubiquitous digital devices, researchers are increasingly able to draw conclusions about people's health, habits, beliefs, and practices using methods that require no contact with, or awareness by, research subjects. Increasing
Pervasive data researchers in academia, industry, and government face a set of nested ethical problems that emerge from the combination of the datafication of human activity, growing mistrust of digital research practices, and mismatched norms between datafication realities and the traditional importance of research participant autonomy. From the perspective of research ethics, the notable change is not the “bigness” of digital datasets, but the ubiquitous nature of the data sources and collection methods that allow researchers to combine, analyze, and predict human behavior using multiple, partial, and disconnected datasets. Pervasive data research commonly collects such data through indirect means via partnerships with platforms, or through purchasing or scraping digital data. Even when researchers use direct means of data collection, they may have opportunities to repurpose or decontextualize consented data in ways the subject is not able to predict. The availability of pervasive digital data has destabilized the ethical relationship between researchers’ methods for data collection and research subjects’ autonomy to control their participation in research. Though many pervasive data studies have been uncontroversial, too frequently, participants react with alarm (Hallinan et al., 2020; Zimmer, 2018).
Existing institutional backstops designed to support public trust in human subject research have not met the challenge of establishing trustworthy pervasive data research practices. A recent column in
Pervasive data research is not the first field confronted with the shortcomings of traditional institutional ethics regulation. Ethnographic researchers have grappled with research ethics both within and beyond the framework of institutional review (Davies, 2012). Questions of whether and how to make the ethnographer's presence known to research subjects have been long debated in the literature (Bernard, 2006). Pervasive data research has more in common with ethnography than is immediately obvious. The instruments are different—human senses instead of digital sensors, individual sensemaking instead of algorithmic pattern-matching—but both forms of research rely on integration and interpretation of multiple data streams, and both require judgment about what features of a context are relevant for making meaning. The ethical challenges of research with pervasive data—the richly personal nature, the emphasis on observation, integration of multiple data types, and the drawing of inferences and conclusions based on patterns—are the same challenges that can be found in the world of ethnography and participant observation. We are not the first to make this comparison; for example, Muller et al. (2016) discuss epistemic and ontological parallels between big data-based machine learning and grounded theory. However, here we explicitly use ethnographic research ethics as a guide for responding to challenges of awareness, power, and mistrust in pervasive data research. Ethnographers have deep experience in building trust with research subjects, as well as trust in the appropriateness and acceptability of their research—sometimes outside of or in conflict with the primary institutions and expectations of research ethics. Data scientists can use this experience, and the practices of ethnographic intervention, to help define trustworthy practice for pervasive data research.
We argue that pervasive data researchers must, like ethnographers, grapple with challenges of research subject awareness and acceptance as well as the appropriateness of their research, especially given data science's relationship to corporations, governments, and other sites of institutional domination. However, pervasive data research faces a challenge unaddressed by ethnographic tools. Ethnographers must explicitly negotiate awareness and power with their participants because of the physical embodiment their research requires, building trust along the way. But the ease of disembodied digital data gathering flattens the institutional structures traditionally relied on for building trust with research subjects. The challenge for pervasive data research is not only to center discussions of awareness and power in its research practices, but also to dig out from this disembodiment: to find ways to excavate and retexture modes of trust building.
This paper proceeds as follows. First, we detail evidence that many data subjects are largely unaware of the research uses of their digital communications and actions, and when they become aware, too frequently express unhappiness and alarm. We then introduce ways that ethnographic and participant-observation research have dealt with ethical challenges of research awareness as well as representational justice issues that stem from the power dynamics between researchers and research participants. Finally, we adapt those lessons for pervasive data and outline a foundation for trustworthy pervasive data research by engaging researchers in 1) rebuilding participant awareness and 2) excavating explicit considerations of power beyond traditional research ethics concerns.
The trustworthiness challenge
Trust and trustworthiness are complex constructs in the sociological, anthropological, and ethical literature, with debates over both the function and mechanics of trust and trustworthiness. In sociology, trust is often thought of as a construct necessary to deal with complexity and complex decision-making (Luhmann, 2017). Trust allows people to cooperate toward common goals, to pursue disparate goals through partnerships, and to collectively manage uncertainty. Adapted for social sciences research, then, trust enables necessary forms of participation between researchers, individuals, and groups to foster the production of knowledge while managing the possible risks of entering into such partnerships.
Trust is a central problem in pervasive data research because of the methodology's reliance on datafication. Van Dijck (2014) argues that researchers should be wary of datafication as a core method for studying human behavior because the very possibility of datafication relies on fraught and brittle institutional trust. Data subjects may be willing to trade their behavioral data to corporate digital platforms in exchange for services, but that does not give researchers a just claim to that data or a good reason to expect the data to be representative of the underlying phenomena they seek to study. And because data science frequently contributes to automated decision-making, researchers have obligations to consider not only the potential impacts of new knowledge, but whether systems built with that knowledge would be trustworthy.
How might pervasive data researchers act in trustworthy ways in such a fraught environment? Trustworthiness is a form of right action (ethics) that emphasizes our duties and promises to other people (Tullberg, 2008). Our commitments to others can be made more credible (providing a reliable basis for the trust of others) through assurance mechanisms ranging from interpersonal dynamics to institutional constraints such as social norms and law (Hardin, 1996).
For researchers, commitments to research subjects have been defined for decades by principles put forth by the 1979 Belmont Report: respect for persons, beneficence, and justice. Belmont shaped assurance through policy constraints placed upon researchers in the U.S, Canada, Australia, and Europe. For example, in the U.S., trustworthy research practice is codified in the U.S. Common Rule, which interprets respect for persons as meaning “that subjects, to the degree that they are capable, be given the opportunity to choose what shall or shall not happen to them” (Office of the Secretary of The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, 1979). While necessary for trustworthiness, the principles of the Belmont Report and the Common Rule are not exhaustive of the scope of commitments and considerations that researchers ought to consider with respect to human research subjects, particularly for social scientists. As discussed below, the Belmont principles were designed to balance conflicting duties of care for physician-researchers, but social science researchers have oriented their approaches to trustworthiness around considerations of awareness and power. Therefore, social science researchers have had to come to terms with their own disciplines’ histories of exploitation, extraction, and co-optation, as well as their positionality as they foster relationships with research subjects (Sultana, 2007).
Trustworthy practice for pervasive data research—ensuring that researchers meet commitments like respect for persons, beneficence, and justice—is problematized by the ecosystem where digital research takes place. Research participants routinely deny knowledge of widespread research conducted with digital data and express that, while they might be willing to participate in digital data research, they expect to be asked for consent (Fiesler and Proferes, 2018; Gilbert et al., 2021; Hudson and Bruckman, 2004). However, a commitment to respect for persons, defined as self-determination for participants in the Common Rule, is difficult to ensure in pervasive data research. Informed consent is not always logistically or philosophically appropriate for research in the big data age. Logistically, there is now a large amount of data about people available online. Securing individual consent to use this data would be incredibly challenging—if not impossible—where individual identity was knowable, and arguably unethical where doing so would require collecting even more personal data (Ioannidis, 2013).
Philosophically, informed consent for pervasive data research suffers from a number of problems. Metcalf and Crawford (2016) point out that codes of informed consent were established specifically to govern physician-researchers, who balance the broad social interest in research results with their individual duty of care for a patient. The procedures and norms IRBs use to generate trust operate with an unstated assumption that these social conditions hold for all types of research. However, the trust relationships between computational social scientists, data scientists, and the public seldom conform to the social conditions that hold between physician-researchers and research subjects. The norms of pervasive data researchers (unlike, say, the norms of ethnographers) do not currently require preexisting, personal, or even explicitly declared relationships with the communities they study to collect data. The typical scale of data science manifests in numerous ethical and epistemic challenges for understanding the ethical interests of data subjects that extend beyond matters of logistics (Hanna and Park, 2020). Finally, Richards and Hartzog (2017) identify “pathologies” of consent resulting from overuse in the digital age. They argue that consent works best when it is given infrequently, when the harms are visceral and easily imagined, and when the stakes of a decision are significant. Pervasive data research meets some, but not all, of these standards. Explicit, informed consent to research participation happens infrequently for many participants. However, the harms of data research are rarely visceral or easily imagined. And it is unclear to what degree individuals consider the stakes of participating in research. Therefore, while it is unclear whether informed consent is philosophically the right mechanism to navigate the relationship between data scientists and data subjects, it is the case that many of the norms and mechanisms that other forms of research use to achieve informed consent do not translate well to pervasive data research.
Challenges to defining trustworthy practices for pervasive data research extend beyond consent and echo larger social concerns with the power and social impacts of datafication. People are increasingly aware of—and alarmed by—the prevalence of datafication of their digital communications and activities (Auxier et al., 2019; Beninger, 2017; Dubois et al., 2020; Fiesler and Proferes, 2018; Golder et al., 2017; Gruzd and Mai, 2020; Hallinan et al., 2020; Hudson and Bruckman, 2004). For example, Hallinan et al. (2020) examined public reaction to Facebook's emotional contagion study. One of their findings was that commenters objected to the idea of “living in a lab” or being studied without their awareness. Research also indicates that unwillingness to participate in pervasive data research is a larger concern among marginalized communities, where issues range from fear of surveillance and deportation (Nebeker et al., 2017b) to concerns that deployed technologies will fail to represent the needs and realities of user communities (Winchester, 2018) and to unwanted amplification of content or communities (Dym and Fiesler, 2020). Moreover, as Hoffman and Jonas (2017) point out, the costs of online participation are unequally borne by women and people of color, obligating researchers to consider the differential needs of vulnerable data subjects. Both distrust in platforms and concern for the uneven risks of surveillant research methods signal challenges of social power—who bears risk, and how partnerships with platforms shape that risk—that researchers must navigate.
Alarm over digital datafication was not (primarily) created by researchers. Influential works like Zuboff; (2019)
Parallel to the growing public alarm about datafication has been a restriction of research access to some forms of pervasive data by social media platforms. As Tromble (2021) traces, platforms have moved away from open API access for researchers to narrower, more careful data access efforts enabled by tools like differential privacy (King and Persily, 2020). Tromble (2021) characterizes the “post-API” era as an opportunity for reflection by pervasive data researchers on the rigor and ethics of their data practices.
Data scientists are not alone in facing challenges of defining trustworthy research practice, coping with a research ethics governance infrastructure misaligned with their methods and approaches, or responding to participant worries about invasive techniques and complex power dynamics. We believe there is useful instruction in the history of another research methodology that has struggled with trustworthiness: ethnography.
Trustworthy ethnographic research
In a classic ethnographic methods textbook, Spradley writes: No matter how unobtrusive, ethnographic research always pries into the lives of informants. Participant-observation represents a powerful tool for invading other people's way of life. It reveals information that can be used to
Substitute the words “pervasive data” and the concern is the same. Reflecting the first and foremost on whether data use will affirm the
This dedication to the rights and interests of the subjects of participant-observation comes from a long and painful history of the use of ethnographic methods. The history of ethnographic research is also one of colonialism. The earliest ethnographies were conducted by American and European academics through fieldwork among indigenous peoples, and this practice continued well into the 1980s (Thrift, 2003). The power-laden character of the colonial encounter in ethnographic fieldwork was recognized as a challenge from early on, as ethnographers were acutely aware of how they were seen by their informants as extensions of colonial government (Stocking, 1991). But as postcolonial movements took hold in academia, and particularly in anthropology (Asad, 1995), scholars began to grapple with, as Thrift describes, “whether it was possible to have encounters with others which were not inevitably, in some sense, colonial in form and content and had some genuine ethical weight” (2003: 107). A recentering of the rights of, and obligations to, research subjects was the result of this field-wide reckoning. Today, ethnographers are expected to reflect on their power as researchers, as well as on what they are taking from the communities they study. As Fine and Weis write, “Researchers can no longer afford to collect information on communities without that information benefiting those communities in their struggles for equity, participation, and representation” (1996: 293–294).
Ethical concerns about early ethnographies were not limited to colonialism but also extended to other historically disenfranchised groups. One of the most famous examples of controversy over ethical issues in ethnography surrounded Laud Humphreys’ 1968 dissertation and later book titled
While the ethical merits of the study are debated, it is clear that We’re so preoccupied with defending our privacy against insurance investigators, dope sleuths, counterespionage men, divorce detectives and credit checkers, that we overlook the social scientists behind the hunting blinds who’re also peeping into what we thought were our most private and secret lives. But they are there, studying us, taking notes, getting to know us, as indifferent as everybody else to the feeling that to be a complete human involves having an aspect of ourselves that's unknown (1970: B1).
In this telling, the social scientist is not only invisible (behind hunting blinds), but also uncaring and fetishistic, part of a discipline that has a “peculiar taste for nosing around oddballs.” Von Hoffman concludes that even if Humphreys’ motives were good, “no information is valuable enough to obtain it by nipping away at personal liberty” (1970: B1).
As research ethics legislation began to be codified internationally, researchers’ growing engagement with the ethics of ethnographic fieldwork collided with university review. Despite ethnographers recognizing the need to build trust with both the communities they studied and the institutions that employed them, many found the IRB process to be an obstacle to be overcome rather than a trust-building opportunity. The difficulties ethnographers faced in IRB review closely mirror the difficulties that pervasive data researchers now face with institutional review mechanisms. In Patricia A. Marshall's telling of this history, the primary challenges for ethnographers were “first, professional competency of IRBs to evaluate anthropological protocols; and second, applications of requirements for informed consent” (2003: 272). Without experience on ethnographic methods incorporated into IRBs, relatively low-risk research proposals were more likely to be elevated to the highest level of scrutiny, Marshall reports. And “the legalistic rendering of consent models used by most IRBs fail[ed] to recognize the social construction of informed consent as an act of communication” (2003: 274).
In response to the difficulties IRBs posed for ethnographers, Marshall offered recommendations that have since been adopted within fields such as anthropology and have eased (if not entirely resolved) the difficulties she identified. Marshall's recommendations included calls for representation of ethnographers on IRBs and outreach to IRBs from ethnographers to communicate ethnographic best practices. She also recommended robust documentation of research protocols for informed consent within the discipline, educating policymakers about the relevant challenges to informed consent for ethnographic fieldwork, and education within disciplines to include ethical guidelines in methodological training. Lastly, she called for research into how university review boards evaluate research proposals, observing that “[i]nformation on the decision-making processes used by IRBs in approving or rejecting a research proposal would be useful” (Marshall, 2003: 280) for crafting more effective and ethical ethnographic fieldwork protocols.
Ethnographers have developed a suite of techniques to gain the trust of participants. First, ethnographers must gain entrée: the permission of participants to be in their space and lives. As part of gaining entrée, ethnographers typically communicate their research objectives to participants and engage their participants in ongoing conversations about the research in progress. Another important facet of gaining entrée or permission to conduct the research is ensuring that participants receive something meaningful for their participation. Spradley's textbook recommends that “every ethnographic research project should, to some extent, include a dialogue with informants to explore ways in which the study can be useful to informants” (1980: 22–23).
Entrée is only the beginning of awareness in ethnographic projects. Ethnographers frequently encourage research participants to read parts or all of their analysis.
Some ethnographic traditions move beyond entrée and participant checking to
These methodological interventions accompanied a broader, deliberate, decades-long theoretical and political shift that decoupled ethnography from the levers of colonial, state, and corporate power (Alkhatib, 2019). This shift consisted of several key components. One was a series of field-defining self-critical investigations of how the discipline had functioned as an agent of these powers (Asad, 1973). Another was the gradual growth of broader demographic representation within the discipline beyond historically white male cohorts (Patterson, 2020). Yet another component was a shift in topics of study, answering the call for
Excavating awareness and power in pervasive data research
Data collection, once the province of researchers, is now dominated by companies and governments. Growing distrust in big data research hinges on the fact that people increasingly realize how vulnerable the datafication of their lives makes them both to commercial platforms and governments, which use pervasive data to sell and surveil, categorize and control, as well as to discipline and discriminate. By employing pervasive data as a tool for research, data scientists participate in this legacy, and—much like ethnographers grappling with the extractive, colonial legacy of their methodology—must take specific action to address their place in the entangled social problems of digital data analysis. This means data scientists should be thinking about all of the powerful corporate, state, and societal forces entangled in big data. However, data scientists must do so in an environment in which traditional structures that support trustworthiness—clear practice norms or guidelines, direct interaction with research subjects, approval by ethics review boards, and distinctions between academic and commercial benefits—are absent or much less visible. How can we excavate structures to support trustworthiness that have been flattened in pervasive data research?
To retexturize this flattened landscape, we recommend that data scientists probe appropriateness and complex potential harms using two lenses directly inspired by ethnography. First, data scientists can learn from ethnographers’ contextual sensitivity and experience helping research subjects interpret research participation by reflecting on
Awareness
Supporting awareness of datafication begins with reflection on the nature of the communications or traces being used for research. We map awareness of pervasive data to two spectra based on how digital traces are created: traces created in private to public settings, and traces created by intentional to automatic means. Private, intentional data trails are “secrets”: for example, texts to a spouse or family photos. People are aware they are creating communications or documentation, and also make conscious efforts not to share them widely. Public, intentional data trails are “broadcasts”: for example, tweets. People are aware they are creating communication or documentation and purposefully share them widely, even if they may not understand the extent of their reach (Proferes, 2017). Private, automatic data trails are “espionage”: communications or documentation created without human intervention or awareness and not widely shared. Examples include the data collected by a DVR, smart fridge, or thermostat. “Espionage” data can also include geolocation or telemetry data collected as part of the normal functioning of devices like smartphones. Though users may be aware of functions that require such documentation, they may not know the extent to which it is being collected (Hannay and Baatard, 2011). Finally, public automatic data can be thought of as “exhaust”: documentation captured in public that individuals are not aware they are putting out into the world. Examples are CCTV camera recordings or satellite images of a home.
We recommend that researchers reflect on where their data-gathering methods fall on the private/public and automatic/intentional spectra (Figure 1) and use this reflection as a guide for considering both awareness and power implications of their research. Researchers using “broadcasts” should consider that while participants were likely aware of creating a communication, they may not be aware of its potential for use in research. Crucially, using “public” data does not automatically relieve the researcher from considerations of participant awareness, because awareness of creation is not necessarily awareness of research use. Data scientists using “espionage” have an even higher bar to clear, as they need to make participants aware of both the existence of a data trace as well as its use for research.

Spectra of data awareness.
Meaningful informed consent is one standard for raising data subjects’ awareness of research. But for pervasive data researchers who can't secure meaningful consent because of scale, pervasiveness, or other issues, adaptations of entrée and participant checking should evolve with the field. As a starting point, they might include website pop-ups, such as those used as part of GDPR notification requirements, that explain the research, risk to participants, and allow potential subjects to opt out. Another way to build awareness is to increase research subjects’ participation in the research. Increasing participation will require researchers to think about what participation looks like for the community or population they study (Sloane et al., 2020). Defining a community or population encourages researchers to resist universalizing data science findings, instead paying particular attention to who is and who is not included in the data under analysis (Costanza-Chock, 2018; Hargittai, 2015). Participatory action research (Khanlou and Peter, 2005) has grappled extensively with questions of participation, motivation, and accessibility, and can guide data scientists on challenging questions such as how to define a community, how to structure participation, and how to ensure representation of stakeholders across a community.
Power
Pervasive data researchers and the institutions that support them should also explicitly consider power relations and representational justice: they must consider the appropriateness of converting digital traces into research data, much as ethnographers have learned to consider when and if they should exert their power to transform particular groups into research subjects. However, just as with awareness, pervasive data researchers face an additional challenge. Ethnographers often experience both the power and vulnerability of their participants because of the embodied, affective experiences of being in the field (Ceglowski, 2002). Researchers who have collected or acquired datasets without direct contact with participants do not have this embodied experience as a reminder or cue.
Therefore, pervasive data researchers must remind themselves to be reflexive about the power they hold and they should try to reflect on how their research question development, data creation or data gathering techniques, data analysis, and writing benefit from their power over data subjects, as well as the context of their research and their own subjectivity (Hegelund, 2005). They should consider whether it is appropriate to make a given community, stakeholder group, or population more vulnerable either by creating new forms of data (which may be used by other parties to increase their vulnerability) or through secondary uses of data: by making research data out of traces or communications created for other purposes. This consideration might involve spending time (virtually or physically) in a community to understand their norms, collaborating with a community to serve their needs, or speaking to community gatekeepers to understand specific harms—for example, amplifying content beyond its intended audience (Dym and Fiesler, 2020).
Using pervasive data for research increases the vulnerability of the people included (and potentially other people like those included), whether by amplifying their behaviors and beliefs, showing new connections or inferences between their activities and habits, or applying categories or labels to their actions. Scholars are increasingly developing approaches to help researchers think through implications and harms of datafication, such as the Omidyar Network's
To excavate those viewpoints, we recommend that researchers reflect on their power relative to the people they are studying. A clear theory of power is necessary for pervasive data research because there is a strong ethical argument for studying very
As part of systematically examining power in pervasive data research, researchers should reflect on their relationships with platforms that create, collect, mediate, and disseminate pervasive data. The relationship between researchers and platforms is fraught with both trust issues and institutional privilege (Cooky et al., 2018). But cultivating trustworthy pervasive data research must include engaging with the platforms towards ethical new knowledge creation: pervasive data research cannot be confined to the secrecy (and benefit) of industry. Platforms, like universities, can contribute to institutional trust in pervasive data research and researchers, but only if their relationship to researchers is open and transparent. Pervasive data researchers partnering with corporations for data access should reflect on how and whether they can provide sufficient transparency into the researcher/platform relationship. They should also be aware of, and transparent about, the limitations that platform dynamics put on the representativeness and meaning-making potential of their data.
Researchers should also reflect on their data's relationships to, and potential interest to, governments. State power is increasingly exercised through purchase or subpoena of data created by platforms. As researchers contribute to what can be known about data subjects, they ought to consider how that knowledge may serve state power. This is particularly applicable to domains like criminal justice and immigration enforcement, but research ought to consider how all data science work might bolster the already powerful—a process that anthropology similarly had to undertake (Alkhatib, 2019).
Of course, determining one's power relative to research subjects, platforms, and state actors is a complex process, and we realize that such reflection is a difficult task for many without a background in theories of power. However, there are numerous helpful frameworks for thinking through issues of power adapted specifically for digital data research, including anti-essentialism (Neyland, 2016), feminism (D’Ignazio and Klein, 2020; Hoffman and Jonas, 2017), anti-racism (Benjamin, 2019; Hanna et al., 2020), anti-colonialism (Dourish and Mainwaring, 2012), and queer theory (Brim and Ghaziani, 2016). These frameworks can provide concrete guidance to researchers considering how their protocols might unevenly subject participants to increased vulnerability. And many pervasive data researchers are already successfully grappling with such frameworks. For example, Blodgett et al. (2020) have shown how discussions of “bias” in natural language processing are largely discussions of power, particularly the relationship between language and existing social hierarchies. Barabas et al. (2020) have called for studying up as a technique to improve algorithmic fairness, and Mathur et al. (2020) have used pervasive data to analyze corporate and political deceptive practices. Mohamed et al. (2020) have grappled with how to embed decolonial theories in AI practice. Kaleidoscope (2019) is a “positionality aware” machine learning project that seeks to move discussions of researcher and data context to the center of data research. And the “Tech Won’t Build It” movement has moved discussions of power and refusal from academic research to corporate practice (Science for the People, 2018). We hope that more leaders in pervasive data research will follow these examples.
Reflections on awareness and power are not required of ethnographers—or data scientists— by institutional regulatory structures like IRBs and REBs. Within ethnographic research, adherence to these reflections has become bound up with definitions of ethical research, good methods, and good practice. The community of ethnographic researchers holds each other accountable for reflection on awareness and power, performed through techniques such as positionality statements and reflexivity in writing. Ethnographic researchers have also committed themselves (however imperfectly) to broadening the scope of participation in the discipline and have demonstrated a willingness to explicitly disavow work that conflicts with their professional organizations’ codes of ethics. We recognize that awareness and power are not the only ethical considerations in pervasive data research, but reflecting on these issues is a prerequisite to more trustworthy forms of research.
Both individual researchers and the myriad of professional organizations that support pervasive data research can contribute to a more trustworthy research community. We hope that research leaders will lead reflection on awareness and power with their students and collaborators, and that journal and conference reviewers will look for these reflections in the methods sections of papers. We also hope that professional organizations will establish professional norms and codes for trustworthy pervasive data research, as anthropologists and sociologists have done. While some Internet- and AI-focused research organizations have been leaders in this area (Castelvecchi, 2021; franzke et al., 2020), with the increased adoption of data science methods into the standard research repertoires of many fields, there is a need for traditional professional organizations to join codification efforts. While norms and codes are insufficient on their own, they are important to set standards of professional conduct, disseminate appropriate research practices, and resolve ambiguities and disputes over what constitutes appropriate research practice. Professional standards are also important mechanisms for learning from past experience and enshrining such lessons in public articulations of values, ethics, and norms. The history of research ethics is, in many ways and for many disciplines, comprised of the scar tissue that has grown over past controversies. The state of research ethics has gradually improved as researchers have examined their past and put in place interventions to ensure such controversies are not replicated in the future.
Digital device users are already quite vulnerable as their traces are turned into data by researchers, corporations, and governments. We hope that pervasive data research will embrace reflections on how to decrease this vulnerability, and will hold each other to higher ethical standards, particularly with respect to openness with research subjects and reflexivity on power and impact. As social computing, computational social science, health, natural language processing, and other pervasive data researchers consider their methods and sampling strategy, they should also ask themselves: are my research practices trustworthy? Critically, trustworthiness is a
Footnotes
Acknowledgements
This article was a product of the PERVADE project, an NSF-funded collaboration focused on empirical questions in big data ethics. Many thanks to Emily Dacquisto for logistical support on this article and throughout the project. Thanks also to the anonymous reviewers who substantially strengthened our arguments and presentation.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Science Foundation's Division of Information and Intelligent Systems awards 1704369, 1704425, 1947754, 1704598, and 1704303.
