Personal internet archives and ethics

Abstract

In its ethics guidelines, the Association of Internet Researchers advocates a bottom-up, case-based approach to research ethics, one that emphasizes that ethical judgement must be based on a sensible examination of the unique object and circumstances of a study, its research questions, the data involved, and the expected analysis and reporting of results, along with the possible ethical dilemmas arising from the case. This article clarifies and illustrates the mind-set and process of such a bottom-up approach to internet research ethics. Two ethics concepts to think with, namely ‘the distance principle’, and the notion of ‘perceived privacy’, are introduced and applied in a concrete empirical internet study, in which web archives based on personal communications on social media formed the main body of data. The empirical case example serves to highlight the unique challenges of internet research ethics, in light of the blurring boundaries between text and person, and between private and public on the internet, with profound implications and challenges for the definition of human subjects, privacy, and so on.

Keywords

internet research ethics perceived privacy social media the distance principle web archives

Introduction: Ethics 2.0

In 2002, The Association of Internet Researchers (AoIR) published the first set of guidelines for internet research ethics (Ess et al., 2002). The document advances a pluralist framework for ethical decision-making, one that accommodates cultural and national differences, as well as various disciplinary standards in a highly interdisciplinary and internationalized area of research (Buchanan, 2011). Ten years later, AoIR is finalizing an ethics 2.0 framework that builds on and elaborates the existing guidelines for protection of human subjects in internet research through measures such as informed consent, privacy and so on, in light of more recent technological and research developments (e.g. the advent of so-called ‘social media’, an increasing interest in data mining and big data research, etc.).¹ In the AoIR ethics document, internet research is defined as inquiry that (a) utilizes the internet to collect data; (b) studies how people use the internet; and/or (c) utilizes data sets and repositories available via the internet (Markham and Buchanan, 2012).

A key issue in question in current debates over internet research ethics is whether such research endeavours involve human subjects or not. This question is far from straightforward: there is no agreement amongst scholars and disciplines, and definitive answers seem rather elusive. Instead, AoIR’s pluralist approach to internet research ethics advocates, as I will elaborate in this article, that ethical judgement in internet research must rely on an inductive, contextual assessment of the specific case in question, and that different ethical choices may be equally legitimate and sound.

In this article, I review and apply some of the core elements of the AoIR 2.0 ethics guidelines, namely the ‘distance principle’ and the concept of ‘perceived privacy’ in a discussion of the concrete ethical decision-making in a qualitative internet research project. The project that I use as the case in point is a study of the use of social media in everyday life, conducted between 2008−2010, in which I used web archiving techniques to create communication archives from personal blogs and Twitter profiles. I argue for – and discuss the implications of – understanding internet research ethics as a process – in my case involving an ongoing reflection and continued dialogue with research participants about the research aims and results. Through an illustrative step-by-step guide to the reflection and assessment of the core ethical dilemmas I encountered in my specific internet study, I aim to provide clarification and fuel for future discussion of some of the key ethical conundrums commonly encountered in internet research.

Ethics and internet research: Core principles of ethics to ‘think with’

Broadly, research ethics questions and decisions evolve around the respect for human subjects, their autonomy, and protection from harm in the process or aftermath of research. As elaborated in a recent general overview of internet research ethics (Buchanan, 2011), the internet complicates the basic question of personhood in research ethics. Does a personal blog equal a human subject? Or is a blog to be considered a textual artefact, that is, a cultural product that may be attributed to an author but nonetheless is somewhat detached from the person who produced it? The way internet phenomena, and with this, the data of internet researchers, are conceptualized in regard to personhood will determine whether the research involves human subjects or not, and thus under what circumstances ethics measures such as informed consent are required or recommended.²

A related key issue in ethics discussions of internet research concerns the blurring notions of ‘public’ and ‘private’ spheres of interaction associated with the communicative configuration of various online domains and genres. For instance, while it may be clear that an online newspaper constitutes a public space, the information posted on a publicly accessible Twitter or Facebook profile of an ordinary user might not always be readily defined as public information. For instance, think of a personal conversation among friends on the Facebook wall, or a photograph of one’s new-born child posted to Facebook. How should such data be treated on the public−private scale in the context of research ethics?³

Questions like these are complex and difficult to tackle; there is far from common agreement among scholars of the internet and other digital media. Hence, for ethical assessment and decision-making in internet research there is no ‘one-size-fits-all’ scheme, and thus no commonly accepted best practice to lean upon for researchers and Institutional Review Boards (IRBs).⁴ Against this background, the AoIR guidelines advocate a bottom-up, case-based approach to research ethics, one that emphasizes that ethical judgement must be based on a sensible examination of the unique object and circumstances of a study, its research questions, the data involved, the expected analysis and reporting of results, along with the possible ethical dilemmas arising from the case. To support a careful ethical reflection and evaluation, the guidelines offer a set of core principles of ethics to ‘think with’, two of which I will discuss in some detail, namely the distance principle and the notion of perceived privacy. These two principles address the key issues, outlined above, of personhood and privacy, respectively, and will serve as theoretical reference points for the case-based ethics discussion that I provide in the following section. The guidelines further recommend examining previous research of a similar kind for concrete guidance to the solution of possible ethical dilemmas.

The distance principle concerns the conceptual or experiential distance between the object of research and the person who produced it. It is thus not merely a description of the closeness between researcher and subject, but of the relationship between the online data that the researcher wishes to collect and analyse (posted texts and images, a set of relationships among entities, demographic information posted on social network profiles, online survey responses etc.) and the person(s) whose activities created the data. For example, we may more likely identify a study using a qualitative sample of emails, status updates or blog posts as one that involves human subjects, because the experiential distance between an identifiable blogger and a blog text is perceived to be small (e.g. Kendall, 2002; Sørensen, 2009), whereas a big data study with an automated sample containing millions of tweets, or a study of traffic patterns on the world wide web might be naturally perceived as one that does not involve human subjects, because the experiential distance between the (millions of) specific persons and the huge log files produced on the basis of their online activities is relatively long (e.g. Horan, 2012; Leskovec and Horvitz, 2008).

The distance principle also applies to reflections regarding what types of analysis the data are used for: for example, a close content or discourse analysis of a (large) data set, using text examples, may more likely invoke the idea of dealing with human subjects than might an analysis of a similarly large data set, say, link patterns, temporal structures, languages, use of third party applications, or the prevalence of conversations on Twitter, which presents data only in aggregate form. Hence, the character of the data set, the intended analyses and reporting of results are all components to consider in a bottom-up approach to internet research ethics.

The concept of perceived privacy concerns the expectations that internet users may hold concerning the privacy of their online activities, their control over personal information, and their protection from harm. Several scholars have made the keen observation that although many forms of online communication are publicly accessible, and available for anyone who cares to read along, participants may perceive their postings as private (e.g. Sveningsson-Elm, 2009; for a review of this conundrum, see McKee and Porter, 2009: 6−11). While such a perception of privacy may seem naïve, given the public default settings for many online sites and services, privacy expectations may not only involve the posted information itself, but also what it is used for (see Markham, 2012). Nissenbaum, for instance, has proposed a privacy model based on what she labels ‘contextual integrity’, arguing that ‘what people care most about is not simply restricting the flow of information but ensuring that it flows appropriately’ (Nissenbaum, 2010: 2 [emphasis in original]).

Taken to the extreme, the notion of perceived privacy implies that the researcher is not experientially close enough to the participant or sensitive enough to make a judgement as to whether and under what circumstances blog posts, tweets, listserv postings and so forth are considered private – the researcher becomes an outsider; however, read as an ethics concept to think with in a concrete study, perceived privacy highlights the ethical complex arising from the blurring of public and private spheres and circumstances of communication on the internet, and may thus, along with the distance principle, be a useful tool for thinking about the potential for, and the protection of the subject from, harm in the process of research.

The case – constructing personal archives from social media

At this point, the discussion of the distance principle and the perceived privacy, as core ethics concepts in the AoIR 2.0 guidelines for internet research ethics, will benefit from a case that illustrates thinking through the question of personhood, and the associated ethical dilemmas and paths taken in the process of a concrete internet research project. For this purpose, I draw on my PhD project (Lomborg, 2011), which was driven by an empirical interest in how social media become part of the fabric of meaning of the daily lives of ordinary Danes. The study involved two qualitative case studies of the use of personal blogs and Twitter, how these uses were incorporated into the rhythm of the users’ everyday lives, and how blogs and Twitter acquired significance and specific meanings in this process. Theoretically, I framed social media in terms of communicative genres, a framework centred on the argument that participants make sense of their online practices through invoking, enacting and negotiating specific understandings and expectations of Twitter and blogging in their engagements with these genres. Thus, I wanted to understand participants’ blogging and Twitter practices from an emic perspective, and the goal of the analysis was to provide thick, contextualized descriptions of how participants actively negotiated and made sense of their engagement with blogging and Twitter in and through communicative practice. This required detailed analysis of the users’ actual, situated enactments of blogging and Twitter in communicative practice.

To capture these communicative practices in their detail, and thus get an appropriate analytic object, I developed a main data set consisting of complete archives of the posts and comments for a number of personal blogs and Twitter profiles for a given time period (six months for the blog study and one month for the Twitter study). The blog and Twitter archives that I constructed are based in individual blogs and Twitter profiles, and display very neatly organized flows of blog posts and comments, and tweets, @replies and RTs, respectively, in the chronological order in which they were posted. In addition to the archives, I also interviewed the participants about their blogging and Twitter practices, their relationships with online peers, and the specific functions and meanings participants assign to blogging and Twitter in the conduct of their daily activities. The specific methodological issues and challenges of working with archives are addressed in Lomborg (2011).

An ethical conundrum

In this section, I want to use my specific case study as a step-by-step illustration of ethical decision-making in internet research, and as a baseline for discussing personal internet archives more broadly in terms of the ethical questions that must be addressed when dealing with this type of data. In my case, the ethical conundrum is as follows. First, the communicative practices that I aimed to capture were performed in spaces that are in some sense public and accessible for anyone who wants to read along. Most blogs and Twitter profiles, including the cases selected for my two studies, were publicly accessible online, and did not require readers to identify themselves through a login and password to access the blog posts and tweets. Second, although to some extent the postings included personal information (i.e. everyday musings and small talk), the data fell under the definition of ‘non-sensitive information’ as defined by the Danish Data Protection Agency. According to the Danish Data Protection Agency, sensitive information includes personal information regarding health, abuse, ethnicity, political or religious conviction, sexuality, criminal records and the like (the Danish Data Protection Agency, 2000: 13), none of which was present in the (typically) highly mundane and everyday-anchored blogs and Twitter profiles that I wanted to study. On this basis alone, one could make a legitimate, albeit strictly judicial, argument that no consent would be needed for the study (for a similar argument, see, for example, Wilkinson and Thelwall, 2011).

Examining the case from another angle, however, one could argue that the construction and analysis of personal archives from blogs and Twitter profiles over an extended period would constitute a rather close tracking of specific users’ online behaviours and interactions. In other words, the archives would present a detailed mapping of what given, identifiable persons say online, who they talk to, at what times during the day, and so on. Furthermore, given that these users posted extensively about their everyday lives, their passions and hobbies, their social relationships (family, friends, colleagues were often part of the postings, although seldom mentioned by name or other identifiable information) and so forth, the content of blog posts and tweets appeared to be experientially close to the users who wrote them (compare the distance principle).

Moreover, I could not know in advance the users’ expectations of privacy. The participants might perceive their blogs and Twitter profiles as personal and private spaces (Sveningsson-Elm, 2009), or at least as spaces that one would not expect to be observed and analysed for research purposes. This possibility of perceived privacy could lead to potential participants finding a research project like mine rather intrusive. So perhaps, framed sharply, participants might understand my data collection as a kind of ‘eavesdropping’ – listening in on a conversation that I was not part of, and not meant to be part of. For this reason, I concluded that, to archive and use their blogs and Twitter profiles as part of my data material, I was ethically obliged to ask permission from the authors of the cases I wanted to include.

This first step in my ethical judgement primarily concerned the process of selecting cases and collecting data. However, another ethical issue arose with regard to the analyses and documentation hereof, including meeting common scientific standards for the responsible use of data, and for ensuring transparency in the qualitative analysis of the data archives. Specifically, to document the analysis, I needed to use (and quote from) excerpts from the collected material as part of the analysis. Under these circumstances, granting anonymity and giving the study participants pseudonyms would be useless – simple string search would uncover the texts and identify the participants.⁵ Despite the data being of a rather mundane character, and thus unlikely to cause participants any harm; and despite the fact that my research aimed not at understanding the participants as unique individuals (i.e. examining motives, psychological profiles etc.) but on understanding the norms and conventions guiding communicative practice in social media (i.e. implying regarding participants as representing a category or type of users), nonetheless the impossibility of anonymity raised questions about how to assess the potential future harms (and benefits) of participating in the project in a larger time-frame, and how to protect participants in this regard. To compensate for the lack of formal anonymity, I opted to implement other ethical measures – both formally in the informed consent agreement, and informally, in the continued interaction with participants – to establish a trustful and respectful relationship with the research participants.

Inspired and supported by the original ethical guidelines developed by AoIR (Ess et al., 2002), I talked to the participants about the project aims of understanding social media practices in the context of everyday life, and their roles as participants representing types in this respect. I then asked the participants to read and sign an informed consent form before the study began. The form: (i) explained my research, what data I wanted to collect and how it would be used; (ii) requested permission to archive the activity on their blogs and Twitter profiles, and to use these data publicly in my research; and (iii) stated their rights, including the right to withdraw from the study at any point in the process, thereby also terminating my right to use their data in the analysis.

In addition to this formal agreement, I decided to make ethics more of an integrated part of the research process. In other words, I began thinking about ethics as a process of ongoing assessment of perceived privacy and ongoing establishment of rapport and trust with participants in the context of research through dialogue (for a similar argument about ethics as process, see Beaulieu and Estalella, 2012). Thus, I implemented some informal ethical procedures, specifically: talking to the study participants about the analysis that I’ve done; volunteering to share the analyses with them, upon their request, so they could check if there was something that they felt uncomfortable about having shared; and offering to refrain from quoting directly any tweets or blog posts that they considered sensitive. This did not compromise my research project, as I was primarily interested in communication that reflects normal, everyday blogging and Twitter practices. Despite my willingness to leave out sensitive content, only one of my participants asked me to exclude a single post from direct citation in the analysis – one that in my best judgement does not stand out from the rest of the material.

While this process approach in a sense made me as a researcher more vulnerable, taking the notion of perceived privacy seriously in this project, and showing care for the use of the data by talking with the participants about these issues also had its benefits from the perspective of the ethics/methods interface. Specifically, the fact that I addressed possible ethical concerns to such a great length to establish trust when initiating the empirical research process, I believe helped in creating a more ‘safe’ and comfortable environment for talking with participants about their practices, experiences and feelings in relation to their online activities in the interviews that I conducted to complement and contextualize the personal internet archives. But most importantly, seen from a research ethics perspective, my commitment to dialogue with participants in the process of research reflected my identification of the collected data as closely connected to human subjects acting in a perceivably private or semi-private sphere, and thus functioned to show acknowledgement of and respect for the autonomy of research participants, as advocated by the AoIR guidelines.

Aftermaths: Ethics as a two-way process

The ethical measures that I took in this project were risky for two reasons: (i) by allowing participants to withdraw from the project at any point in the research process, the project in a sense became a gamble: had one or more participants withdrawn from the study, parts of the analyses would have fallen apart, with consequences for the process of writing up the study, as well as the scientific gain and contribution of my project to the scientific study of social media; (ii) by negotiating privacy with the participants in the process of analysis, and specifically by offering to omit sensitive excerpts from direct quotation, I arguably ran the risk of lessening what Markham (2012) labels the researcher’s interpretive authority over the data. Reassessing the process at this point in time, were these ethical measures necessary, what would their implications be for doing internet research and for scientific practice in general?

On the one hand, from an ethical standpoint, there are a number of reasons why the implemented ethical procedures were reasonable and well justified. First of all, it was culturally and contextually sensitive, given core Danish cultural values of openness and trust both in regard to interpersonal relations and societal institutions.⁶ Second, it showed profound respect for the privacy and autonomy of the research participants, including not simply their expectations (whether justified or not) while posting information online, but also their subsequent reflections and feelings regarding their postings as now reused in a public (academic) context. Third, it is consistent with earlier internet research projects drawing on participant observation – including feminist/communitarian approaches (e.g. Bromseth, 2006; Hall et al., 2004) – that, even if not required by law or code, place highest ethical value on respecting autonomy and wishes for privacy of the research participants. Fourth, it is consistent with the range of approaches to informed consent, and the idea of informed consent as a continuous negotiation, presented by Lawson (2004).

On the other hand, I might have taken too great measures to recognize the autonomy of research participants, and create a trustful relationship with them, given the non-sensitive research questions I was asking in this project.⁷ On a general level, being ethical may, in a sense, make it very difficult to set clear boundaries between the researcher as the person conducting (and thus responsible for) the research, and the research participants whose data inform the research. Arguably, by striving for ethical conduct, the researcher may run the risk of changing the power balance between researcher and participant, so that ultimately the care and respect for human subjects outweighs the concern for scientific development, and, thus, the greater good. Relating back to the notion of interpretive authority, it is reasonable to question whether we can assume research participants to be capable of, in any valid and reflexive manner, assessing their own data in connection with the context in which these data are to be presented. Consider the request of one of my participants to have quotes from a (to me) very typical and seemingly unproblematic posting omitted in the final analysis, and my respect for this request. Although my decision to omit the quote was clearly and ethically based on acknowledgement of the participant’s autonomy and contextual integrity, the participant’s wish was probably not based on reflexive, scientific judgement of relevance and importance in the context of research, but on the idiosyncratic feelings of the participant. This is problematic, not only because it might undermine broadly accepted scientific standards of data management, analysis and documentation, but also because it obscures the fact that the process and products of research are, in the end, the researcher’s responsibility – and that the researcher, qua being the one who has an overview of the project, the one asking the research questions, and the one with professional qualifications and experience, is probably is better equipped for making sound ethical and analytical judgements. As Markham (2012: 15) argues, ‘the researcher must take seriously the role of cultural interpreter, and gain interpretive authority through rigorous and constant practice of their craft’. This implies recognizing that ethics is a two-way process − one in which both the participants’ personal information, and the researcher’s rights to analyse and interpret data, must be protected. When balancing these considerations, in retrospect, it is clear that other ethical judgements could have been equally legitimate in my case. For instance, based on the argument of interpretive authority, the informal ethical measures I applied (i.e. granting participants the right to withdraw data throughout the process, and offering to refrain from quoting specific postings directly in the analyses) might neither have been necessary, nor appropriate.

Conclusion

In this article, I have illustrated the mind-set and approach to internet research ethics advocated by the AoIR, namely, an inductive, case-based approach that takes as a starting point for ethical decision-making a careful examination of the ethical issues arising from the specific research project. Applying two ethics concepts to think with − the distance principle, and the notion of perceived privacy, as they are elaborated in the AoIR ethics 2.0 guidelines − I have highlighted the unique challenges of internet research ethics, in light of the blurring boundaries between text and person, and between private and public on the internet, with profound implications and challenges for the definition of human subjects, privacy and so on. To address these complex challenges, the AoIR ethics guidelines draw attention to the specific details of a research project (i.e. research questions, methodology and data, analyses and publication of findings), as well as disciplinary standards, and cultural-contextual factors, as dimensions to consider in ethical decision-making.

The case-based approach to ethical judgement implies that no final, all-encompassing standards can be established to ensure and evaluate ethical conduct in internet research. Instead, the AoIR ethics guidelines may serve to stimulate and inspire ethical reflection throughout a research project. Furthermore, promoting a set of ethics concepts to think with, the AoIR ethics 2.0 guidelines offer a framework for internet research ethics as an area of inquiry, and accumulate existing research in this area. Hence, the guidelines may function as a repository of cases and discussions to lean upon for future research.

Footnotes

Funding

This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

Notes

References

Beaulieu

Estalella

(2012) Rethinking research ethics for mediated settings. Information, Communication & Society 15(1): 23–42.

Bromseth

(2006) Researcher/woman/lesbian? Finding a voice in creating a researcher position, trust and credibility as a participant researcher in a mediated mailing-list environment for lesbian and bisexual women in a time of conflict. In: Liamputtong

(ed.) Health Research in Cyberspace. Hauppage, NY: Nova Publishers, 85–104.

Buchanan

(2011) Internet research ethics: Past, present, future. In: Consalvo

Ess

(eds) The Blackwell Handbook of Internet Studies. Oxford: Oxford University Press, 83–108.

Danish Data Protection Agency (2000) Persondataloven [Act on Processing of Personal Data]. Available at: http://www.datatilsynet.dk/fileadmin/user_upload/dokumenter/Persondatalovspjece/Persondatalovspjece.pdf.

Ess

and AoIR Ethics Working Committee (2002) Ethical Decision-making and Internet Research: Recommendations from the AoIR Ethics Working Committee. Available at: http://www.aoir.org/reports/ethics.pdf (accessed 21 January 2007).

Hall

Frederick

Johns

(2004) ‘NEED HELP ASAP!!!’: A feminist communitarian approach to online research ethics. In: Johns

Chen

Hall

(eds) Online Social Research: Methods, Issues, and Ethics. New York: Peter Lang, 239–252.

Horan

(2012) ‘Soft’ versus ‘hard’ news on microblogging networks. Information, Communication & Society. Online first, DOI:10.1080/1369118X.2011.649774.

Kendall

(2002) Hanging out in the Virtual Pub: Masculinities and Relationships Online. Berkeley, CA: University of California Press.

Lawson

(2004) Blurring the boundaries: Ethical considerations for online research using synchronous CMC forums. In: Buchanan

(ed.) Readings in Virtual Research Ethics: Issues and Controversies. Hershey, PA and London, UK: Information Science Publishing, 80–100.

10.

Leskovec

Horvitz

(2008) Planetary-scale views on an instant-messaging network. Microsoft Research Technical Report. Proceedings of WWW 2008, Beijing, China. Available at: http://research.microsoft.com/en-us/um/people/horvitz/leskovec_horvitz_www2008.pdf.

11.

Lomborg

(2011) Social media. A genre perspective. Doctoral thesis. Aarhus University.

12.

McKee

Porter

(2009) The Ethics of Internet Research. A Rhetorical, Case-based Process. New York: Peter Lang.

13.

Markham

(2012) Fabrication as ethical practice: Qualitative inquiry in ambiguous internet contexts. Information, Communication & Society. Online first, DOI: 10.1080/1369118X.2011.641993.

14.

Markham

Buchanan

(2012) Ethical decision-making and internet research (version 2.0). Recommendations from the AoIR Ethics Working Committee. Chicago: Association of Internet Researchers.

15.

Millard

Hon

(2012) Defining ‘personal data’ in e-social science. Information, Communication & Society 15(1): 66–84.

16.

Nissenbaum

(2010) Privacy in Context: Technology, Policy, and the Integrity of Social Life. Stanford: Stanford University Press.

17.

Sørensen

(2009) Social media and personal blogging: Textures, routes and patterns. MedieKultur 47: 66–78.

18.

Sveningsson-Elm

(2009) How do various notions of privacy influence decisions in qualitative internet research? In: Markham

Baym

(eds) Internet Inquiry: Conversations about Method. Thousand Oaks, CA: SAGE, 69–87.

19.

White

(2002) Representations or people? Ethics and Information Technology 4(3): 249–266.

20.

Wilkinson

Thelwall

(2011) Researching personal information on the public web: Method and ethics. Social Science Computer Review 29(4): 387–401.

21.

Zimmer

(2010) ‘But the data is already public’: On the ethics of research in Facebook. Ethics & Information Technology 12(4): 313–325.