Sage Journals: Discover world-class research

Abstract

With the exploding availability of online data, digital open-source investigations (OSINV) methods have become increasingly popular in journalism. However, practitioners face novel challenges related to the tension between journalism’s transparency ideals and its duty to safeguard the privacy and security of data subjects. This article explores this tension by drawing on data from eight in-depth interviews with professional open-source investigative journalists in the Netherlands. The findings of our study reveal that OSINV investigators rely heavily on personal assessments and ongoing dialogues with colleagues to make privacy-related editorial choices, as rules and guidelines have only recently emerged. This research provides valuable insights into the intricacies of OSINV journalism, uncovering the delicate balance between journalistic transparency and privacy/security considerations.

Keywords

journalism open data OSINT online data,digital open-source investigations privacy security transparency

Introduction

Journalists¹ are increasingly turning to open-source investigation² tools and methodologies as digital spaces provide access to a greater amount of data (Di Salvo, 2022). These tools allow journalists, especially investigative reporters, to leverage previously unavailable or inexistent datasets, such as satellite images, corporate registries, and many types of user-generated content, to conduct advanced public interest investigations and tell stories that otherwise would not be told (Müller and Wiik, 2023). Although researchers and practitioners have started to explore the ethical aspects of open-source investigations (OSINV), questions about investigators’ perceptions of transparency, privacy, and security when using open-source data and methods remain largely unanswered (Dubberley and Ivens, 2022).

The availability of user-generated photos, audio recordings, and videos is a salient feature of the ongoing process of datafication (Mejias and Couldry, 2019). While these media objects can be understood as eyewitness accounts of human conflict, which are especially valuable for staying informed on areas that are hard to reach (Dubberley et al., 2020), they also risk harming their creators if used uncaringly (Eijkman and Weggemans, 2012). As a result, ethical questions about methods, data sources, and data processing seem to be inherent to the practice of OSINV. Who decides whether certain content is ethical to use? Are there existing ethical guidelines to fall back on? Do open-source journalists receive training to conduct investigations safely?

According to Edwards (2022), the number of resources dedicated to the ethics of media open-source research is increasing. Workbooks, manuals, and guidelines are being developed to assist open-source journalists in navigating these questions. Notable examples include the Berkeley Protocol, published by the Human Rights Investigations Lab (2022), and the Guidelines for Public Interest OSINT Investigations (2023), created by a consortium of European practitioners. Nevertheless, it is unclear if and how open-source organizations and newsrooms adopt these resources. Online newsrooms, in general, remain “woefully under-researched” (Manninen, 2017) in these areas.

Central to the open-source ideology is the emphasis on transparency of sources and openness of methods (Hammond, 2017), which supports the replicability of open-source investigations and potentially increases public trust and legitimacy (Meijer et al., 2014). Such “openness,” as van der Velden (2021: 4) argues, is not an optional feature, as it might be in the case of traditional investigative journalism, but it actually “folds into the methodological choices” that OSINV researchers make as an epistemic practice. Such openness brings tensions between the transparent nature of open-source investigations practice and the privacy needs of the civilians producing the material upon which investigations rely. A paradigmatic example of this tension is the use of what practitioners refer to as information that lives in a “gray zone.” Following the work of Hribar et al. (2014), in this article, we define “gray zone information” as information whose ethical usage is not immediately evident or readily determinable by an average investigator. Moreover, as the usage of gray zone information by investigators makes evident the fracture lines within OSINV practice, in this article, we contend that by using gray zone information as a case study, we can observe the field’s ongoing ethical discussions in which OSINV practitioners engage to determine how to protect the privacy of their sources.

To gain insight into how OSINV practitioners balance privacy concerns arising from using gray zone information with their investigative goals, this article asks the following questions: (RQ1) How do open-source investigative journalists understand and mitigate privacy issues? (RQ2) What stance do open-source journalists have toward gray information? And (RQ3) what guidelines and guardrails do newsrooms put in place when open-source journalists deal with privacy issues? We draw from interviews with eight open-source investigators working in OSINV organizations registered in the Netherlands to answer these questions. Dutch organizations are an interesting sample for this study given their international outlook resulting from the closeness to the Non-Governmental Organizations ecosystem surrounding The Hague’s international institutions.

Literature review

Open-source investigations

Due to its widespread adoption beyond official intelligence organizations, the open source in OSINV has acquired two meanings (van der Velden, 2021). The first honors the military roots of the term and refers to the use of non-classified data to create actionable insights. The second and most recent meaning understands OSINV as a subset of the open-source software movement, characterized by a belief in social responsibility through openness and collaboration. In this section, we briefly establish the links between OSINV and the broader open-source movement to make explicit how its ethos interacts with core journalistic values. During the 1990s, the open-source software movement emerged as a branch of the hacker community, strongly emphasizing experimentation, play, and democratic ideals. According to Hansen (2015), these individuals were motivated by a pro-social interest in information liberation and the free flow of knowledge. Thus, the primary objective of the open-source movement was to ensure that information was freely available to all without requiring individuals to pay for access to software codes or data (Hansen, 2015). The movement advocated for generating revenue by offering services and practices that are based on open-source information (Young, 1999). In the early days of the internet, easy access to data and tools predominantly interested state actors and corporate departments at one end and social media hobbyists at the other (Edwards, 2022). Corporate entities used open-source data and tools to compile risk assessments. In contrast, social media hobbyists scoured images and videos to fact-check the claims that parties in armed conflicts had made. From 2010 onwards, investigative agencies focused specifically on collecting and analyzing digital data (i.e., Forensic Architecture, Bellingcat, and the Syrian Archive) were established and picked up on the possibilities that open-source data and tools offer to map out human rights violations accurately and report on them.

The enormous amount of easily accessible data and the increased technical possibilities that enable open-source intelligence brought an entirely new dynamic to journalism (Müller and Wiik, 2023), particularly to investigative journalistic practices and war reporting (Cooper and Mutsvairo, 2021). This new dynamic consists of new digital tools, methods, and meeting points to conduct advanced investigations and collaborate with other investigative reporters across borders (Carson, 2021; Dodds, 2022). As McGregor et al. (2017: 1) put it, “over the past 15 years, the professional journalism industry has undergone a paradigm shift, wrought largely by the adoption of cooperative technologies both inside and outside the newsroom.”

The introduction of open-source methods into journalistic practice has been described as a “collaborative turn in investigative journalism” (Müller and Wiik, 2023) or the rise of global networked journalism (Berglez and Gearing, 2018). Online, investigative journalists are part of vast networks of colleagues, often working with the same data and creating stories collectively, such as in high-impact cases like the Panama or the Paradise Papers carried out by the International Consortium of Investigative Journalists. The benefits of working collaboratively and digitally include sharing costs and information, increased story reach, and the allowance for more complex reporting on a global scale (Carson and Farhall, 2018). However, McGregor et al. (2017) also contend that within the framework of shrinking physical newsrooms–which compel journalists to operate in a more collaborative and digital manner–there is an increasing concern among journalists regarding their security and privacy. To mitigate these concerns, journalists are adapting by migrating to more secure communication channels and erasing messages that could potentially compromise their sources.

Even though open-source ideology and methodology stem from hacker culture and are intertwined with the internet’s historical development (Kelty, 2008), some of its normative values correspond to those of the journalistic culture (Lewis and Usher, 2013). One mutual key value between these two cultures is participation, which, although not historically part of the normative framework of journalism, has emerged as part of the journalistic ethos for the digital age (Dodds, 2021). Just like within open-source ideology, journalistic participation suggests that consumers take on a more active, monitoring, and engaged role – helping to supervise and co-produce the news instead of merely commenting on post-publication. Instead of treating news as an end-product, open-source practices turn journalism into a participatory process that users can meaningfully contribute (Robinson, 2011).

A second normative value that open-source ideology and journalism share is transparency, which constitutes notions of accuracy and sincerity among news consumers (Phillips, 2010). In open-source journalistic practice, disclosure transparency is the norm. This refers to the process by which journalists provide a detailed explanation of how they produce news (Karlsson, 2010). Through sharing which sources were used and which steps were undertaken, transparency leads to the replicability of the investigation, enabling people to trace back a story and fact-check it themselves (Phillips, 2010). Like in academia, journalistic replicability leads to a sense of accuracy and legitimacy among consumers (Eijkman and Weggemans, 2012). At a time when the markers of journalistic authority – monopoly of news selection, objectivity, commitment to democracy – do not hold self-evident legitimacy anymore, transparency is increasingly viewed as able to retrieve this authority (Perdomo and Rodrigues-Rouleau, 2022). Thus, enabling and innovating journalistic authority and legitimacy through transparency has become a promising feature of open-source journalism (Dodds, 2021). Despite transparency leading to notions of accuracy and sincerity, the increased openness it causes can lead to security violations and privacy breaches (Meijer et al., 2014).

The civilian visual security paradox

Conceiving privacy as “freedom from unreasonable constraints on the construction of one’s identity” (Agre and Rotenberg, 1998: 7) raises questions about the roles and responsibilities of open-source digital journalists and organizations who fuel their investigations with user-generated data. This is because digital open-source investigations can lead to exposing people’s most sensitive personal data, such as identity traits and locations, making them susceptible to surveillance and harassment. What Saugmann (2019) terms the civilian visual security paradox exemplifies how open-source journalists can put civilians in danger. The civilian visual security paradox describes how images and videos that civilians in conflict areas post to call attention to their circumstances can quickly turn into sources of danger when digital open-source journalists fail to deal with this content with care. Saugmann further argues that open-source investigators must “respect the protected status of civilians in their online collection practices – so far, however, there is little sign of such respect” (2019: 344).

People who are visible in (sensitive) videos or images may be exposed to different risks when an open-source investigator decides to use such content. For example, suppose a video of a protester against an authoritarian regime is posted on a social media account and later embedded in an open-source news item. In that case, it might lead regime authorities to track and punish this person. Though users may be posting the content themselves, journalists who reuse it must keep in mind the user’s right to privacy (Pastor-Galindo et al., 2020), for making the content publicly accessible is not equal to asking for it to be distributed, aggregated, or otherwise scaled. When embedding the content of everyday internet users, open-source investigators may trigger what Bellingcat investigator Giancarlo Fiorella (2021) refers to as ‘the spotlight effect.’ The spotlight effect occurs when hitherto ‘unnoticed’ content is embedded in open-source news stories and, through reaching a large audience, goes viral. Consequently, a ‘spotlight’ is cast on the publisher of the content, which, for example, can lead to unwanted exposure, privacy breaches, and other safety risks.

Ultimately, journalists’ decision to publish potentially harmful content should be made based on a ‘balancing test’ that compares the potential harms and the possible benefits (Gauthier, 2002). While this decision-making process may be relatively straightforward when dealing with user-generated content that includes personal information and has been posted on an open platform, open-source investigators increasingly face more challenging scenarios. The increased occurrence of data leaks, the popularization of platforms that have both private and public features, such as Telegram (where some invitation-only channels reach thousands of members) (Rogers, 2020), and the consolidation of a market of data brokers that aggregate data from various sources and provide access as-a-service (Lamdan, 2022), such as people/telephone/email finders or the less contentious heritage/classmates websites create an ever more common scenario wherein open-source investigators are required to deal with gray information whose usage requires more thorough ethical reflection.

Gray information

Although open-source journalism is all about reporting based on openly available data, there are cases in which the line between open and closed data becomes blurry. According to Hribar et al. (2014), there exists a gray zone of open-source investigation where gray information resides: semi-legal information, generally not distributed widely and often questionable. Examples include ‘inside information’ from a company’s personnel, the contents of leaked databases, and videos, images, and messages taken from closed or private digital networks, like Telegram groups.

Generally, gray information entails data that was meant to be closed but made openly available through hacking and leaking or which becomes questionable after aggregating multiple data points that, individually, are not likely to be harmful. In legal terms, the “gray zone of open-source intelligence” is an area where community practices and legal interpretation may have a soft clash (Hribar et al., 2014). So far, there is no consensus on ethical guidelines regarding the use of gray information (Rambukkana, 2019), and neither do we know enough about how open-source journalists approach its use.

Ethics guidance inside open-source journalistic organizations

The relatively new nature of digital open-source journalistic practice means that there is limited academic literature on the structures of open-source organizations and newsrooms (Müller and Wiik, 2023). Research on open-source ethos, however, points out that journalistic organizations that conduct open-source investigations emphasize communication, transparency, and strong bonds between members (Belghith et al., 2022). Explicit guidelines about what techniques or tools should be used in different scenarios are rare. While organizations like Bellingcat, the Center for Information Resilience, and, more recently, Al-Jazeera have published documents outlining their approach and methods, they are some of the few to do so. At the date of writing, only Bellingcat had published a document containing specific guidance about the ethics of the investigative process. Still, it focused only on the case of investigating Russia’s full-scale invasion of Ukraine (2022).

In correspondence with the overarching ethos of participation, the OSINV community often provides its members opportunities to increase their investigation skills – i.e., through training and workshops (Belghith et al., 2022). Whether ethical and privacy issues are discussed in these spaces remains under-researched. However, the general literature on the role of newsrooms in dealing with journalistic privacy issues shows that codes of ethics serve as crucial accountability tools that “every major professional organization has adopted and revised: individual news organizations build their own codes to clarify ethical expectations for employees” (Whitehouse, 2010: 313). Answering RQ3 - What guidelines and guardrails do newsrooms put in place when open-source journalists deal with privacy issues? - will attempt to determine whether this is also the case for open-source organizations and newsrooms.

Methodology

To understand how open-source journalists balance privacy concerns with their investigative goals and navigate privacy-related ethical and legal considerations, we conducted semi-structured interviews with eight Dutch open-source journalists (n = 8) in May of 2023. The interviews were, on average, 52 min long and conducted via Zoom. The interviews with all participants were conducted in Dutch and later translated into English.

Exploring how OSINV investigators balance privacy concerns with their investigative goals asks for in-depth inquiry that can be provided by the results of qualitative interviews, which are guided conversations in which the researcher carefully listens to the meanings that participants attach to the research subject(s) (Gubrium and Holstein, 2002). The interview guide (see Supplemental Appendix A in the OSF: tiny.cc/hedzwz) was prepared before conducting the interviews, which were audio-recorded and transcribed verbatim. The audio recordings of the interviews were anonymized and remain confidential.

Open-source journalists had to fulfill specific criteria to fit the sample. First, conducting open-source investigations had to be their primary work-related practice. This essential condition stems from the expectation that full-time digital open-source journalists are likely to be experts who actively participate in the OSINT community, abide by the values and ethos of the movement, and are well-versed in its techniques (Belghith et al., 2022). Whereas digital open-source practice is relatively new to the realm of journalism, it is likely that journalists who do open-source investigations ‘on the side’ - or more in a novice manner - might not be as aware of privacy issues as full-time open-source journalists would be. Second, open-source journalists must work for a journalistic organization conducting open-source investigations. This criterion has been established to be able to answer RQ3 (What guidelines and guardrails do newsrooms put in place when open-source journalists deal with privacy issues?).

Following the work of Belghith et al. (2022), we recruited open-source journalists through purposive and snowball sampling. Recruitment of participants started with purposive sampling through online requests. Upon finding willing participants, snowball sampling was adopted: Participants were asked to nominate other expert open-source journalists. Despite serious attempts to recruit more female participants, the sample comprised seven men and one woman, all professional investigative journalists working with open-source data, methods, and tools at open-source newsrooms. Literature shows that women are underrepresented in open-source journalistic practices (Vuyst, 2020), which again became evident during the sampling process of this study. The participants represented three (Dutch) journalistic open-source organizations or newsrooms: NOS, Pointer, and Nieuwscheckers. NOS is the Netherlands’ national broadcaster and is funded by the Dutch government. Pointer is the open-source and data editorial department of KRO-NCRV, a renowned Dutch public broadcaster. Nieuwscheckers is a Dutch initiative funded by Leiden University.

Given the specialized and relatively small universe of open-source investigators in the Netherlands, we reached theoretical saturation with eight interviews. Theoretical saturation is reached when additional interviews no longer produce new information or themes related to the research questions (Guest et al., 2020). We found that after eight interviews, no new themes were emerging in the data, suggesting that saturation had been achieved. It is important to note that the number of participants in qualitative research, especially in niche fields like ours, is often guided by the richness of the data and the specificities of the research context rather than by predetermined numerical thresholds.

The data gathered by the interviews was processed and ultimately categorized into themes through open and axial coding. First, open coding took place, yielding concepts that were grouped together and turned into categories. Thus, textual data from the interviews were broken up into codifiable parts. Second, axial coding led to establishing connections between codes to create categories. The codebook that was established during this process is presented in Supplemental Appendix B, which can be seen in the OSF: tiny.cc/hedzwz.

Results

This article explores how Dutch digital open-source journalists balance privacy concerns with their investigative goals. It does so by investigating what measures they take to protect the privacy and security of data subjects and what is their organization’s role when dealing with privacy issues. The first subsection addresses how open-source investigators understand privacy issues and suggests that they are aware of their impact on data subjects’ privacy and security, even if the degree to which they feel responsible for safeguarding them varies. Measures that open-source journalists take to protect privacy are the altering of images and the encryption of data they share with colleagues. The second subsection reveals three prevalent categories regarding open source investigators’ stances towards using gray information: supportive of, undecided towards, and opposed to. RQ3 focused on the role of organizations in providing ethical guidance to their members and is answered in the third subsection. Findings include active inter-organizational involvement and a lack of privacy-related guidelines.

Altering images and encrypting shared data

As conveyed by the literature, open-source investigators collectively stress that the privacy and security situations of individuals who provide or feature in open-source data can be negatively affected by their actions. Nonetheless, the degree to which open-source journalists feel responsible for protecting these individuals differs. One recurring opinion in the interviews was that if other (international) news organizations have already employed certain content, any potential ‘damage’ has already been done, so the participant’s responsibility of safeguarding the privacy and security of the data subjects ‘expires.’ This thought is expressed by P5:

Choosing to use a sensitive video depends on whether it has been shared before. We do look at other news organizations. If, for example, the BBC has posted the video before, what we do with it does not matter anymore. Of course, we think about the risk of endangering people, but if their content has already been featured, they could already be in danger (Interview, P5).

Whereas some participants echo the above statement, others feel the duty of protecting the privacy and security of data subjects or providers should be independent of the actions of other news outlets. P1, for example, thinks that “journalists should no matter what avoid playing an active role in enabling governments to track people,” and P3 stressed that “even if the content is already circulating widely, we still attempt to protect the privacy of the content subject or provider.”

Participants indicate they treat each privacy-related case as a unique one “that deserves and receives custom treatment and personal attention” (P3) and “is not benefited by a standardized solution” (P2). Besides clarifying that there are no official open-source privacy-protection guidelines, participants reveal they enjoy relative freedom when deciding whether and how to protect data subjects and providers. This freedom leads some participants to draft guidelines for themselves to adhere to. Abiding by self-established rules offers guidance to go about people’s privacy cautiously. When P4 refers to his personally established and self-imposed rule of never posting videos taken from within homes, he mentions the spotlight effect:

When something like a bombardment is filmed from within a home or apartment, it can be geolocated. We know how to do this, and so do other people. In the Ukraine-Russia conflict, it has happened that Russian intelligence officers geolocated a publicly available video, after which they bombed the place it was filmed in. That’s the spotlight effect. So, I never ever embed videos with risky content filmed from within homes.

Just like P4, P3 mentioned never embedding content shot from within homes in his news items. Avoiding specific content rules out the risk of endangering data subjects and brokers, but “if you want to report on areas where there is no free press, you need methods to safely use personal content because sometimes it is all there is” (P5).

Participants mention using a wide array of measures to protect the privacy of data subjects when embedding their content. Removing content metadata, which contains geolocational details, is a recurring measure that decreases the possibility of localizing data subjects or brokers. Removing watermarks, for example, on TikTok videos, is another way to complicate finding the content creator. Often, participants mention blurring faces and usernames to conceal identities. Besides image alteration and deletion of personal details, data subjects’ security is considered when open-source journalists share collected data (sets) amongst colleagues and community members. P2 said: “I call it ‘good data hygiene’ - the idea that all the data that you deal with is secured. We established a kind of danger-handling model for this.” This model contains questions that P2 and colleagues pose to themselves when sharing data: “What’s the potential danger? What could happen in the worst case? How will we prevent this?” P2 treats the answers to the questions of the danger-handling model as guidelines to keep data from being intercepted by malicious parties.

Some unexpected results regarding privacy protection surfaced during the interviews. First, P1 mentioned that civilians themselves are increasingly aware of the dangers of posting content, which in Ukraine specifically led to a decrease in sensitive content being posted: “Gradually, Ukrainians realized that there is a large online community constantly analyzing civilian war content” to avoid problematic consequences. Second, participants mentioned feeling responsible for the safety and privacy of “bad actors,” like compilers of child pornography networks or proclaimers of hate speech. P2 said: “You don’t want their identity to surface either, for they might harm or even kill themselves when that happens – something that I don’t want to contribute to.” P7, who often uses Tweets to expose public discourses, illustrated his stance on this topic with an example:

To me, the size of someone’s reach and their public position are determinants of whether I anonymize them in my news items. When a relatively unknown civilian Twitter user tweets hate speech, I generally do not expose their account. That’s because I see journalistic relevance in reporting on assertions, not in reporting on a random person. But if a politician tweets hate speech, I will not keep them anonymous. Because then there is also journalistic relevance in featuring the person.

P7 elaborated on the above statement by clarifying his inability to foresee the consequences of embedding content of a relatively unknown individual, which leads him to deal with their privacy situations carefully. Politicians, however, are already in the public eye. P7 states that, therefore, whether journalists anonymize them or not is not important: “Politicians’ comments are already widespread.”

Attitudes toward the use of gray information

Open-source journalists’ stances on the use of gray information (semi-legal data, often from a questionable nature) vary. Three dominant attitudes towards using gray information were identified during the coding process: supportive of use, undecided towards use, and opposed towards use. Despite their differing opinions, all participants admitted that gray information plays a significant role in their investigative processes. This was made evident by the provision of illustrative cases regarding the use of gray information by all respondents.

Open-source journalists who supported the use of gray information generally underpin their views with the explanation that, often, the key to a story’s crux is found within gray information. P1’s statement illustrated this: “Often, [gray information] is a necessary form of information disclosure. I think it can and should always be used as long as [we] don’t have to pay a bad actor – like a Blackhat hacker – for it.” In line with P1’s view, P7 is also supportive of using gray information, stating that “the importance of the investigative goal is often greater than the legality of the means.”

P4 also favors using gray information: “If it is in the public interest and relevant to our research, [we] must use gray information.” When asked to provide an example, P4 talked about infiltrating invitation-only Telegram groups with a fake identity, “sort of as a digital undercover agent,” in which he retrieved information that served as crucial evidence in an open-source news item. Supporters of using gray information emphasize their close relations with their organizations’ legal department, for their positive attitude towards using it does not imply they are willing to risk legal implications.

Some participants have an undecided stance toward using gray information. Generally, their opinions on using it or not differ for every case. P5 stressed that his undecidedness towards using gray information stems from the frequent inability to verify the data:

Often, semi-legal data is even harder to verify than legal, open data. If the data is already leaked, I believe we can use it, it’s just that it’s hard to know whether it’s real. For example, we once refrained from using leaked audio recordings of Russian soldiers communicating with each other about losing a battle. It would really complement our story, but we didn’t know if it was real or fake and potentially posted by Ukrainian soldiers.

Apart from the difficulty of verification, participants indicate their skepticism towards using gray information is due to the risk of breaking the law. The balance between using gray information because it is relevant and necessary to tell a story and not using it because of legal constraints is what makes participants undecided. P3 illustrated the struggle to keep this balance:

If we need leaked or hacked data because the story cannot be told otherwise, we must choose between telling a story with illegal data or not telling it at all... We then must decide the importance of the story. How important is it that people know about this? Is this importance high enough to use illegal data? It’s a continuous discussion and struggle.

Although participants with doubts about using gray information indicate they struggle with it, their final decisions often lean towards using it. This decision is generally made after dialogue with colleagues, editors-in-chief, and the legal department.

A minority of participants are principally against using gray information. The main argument against using semi-legal or illegal data is put into words by P6: “I never use it because to me it is not an option, and in the Netherlands, it is also unnecessary. I always find legal ways to get the information I need.” The proclaimed unnecessity of using illegal or dubious sources is echoed by the other participants who are anti-using gray information. P8 is one of them and argues that “generally, the more tech-savvy you are, the better you are at getting all the information you need in a way that is actually legal.”

Active inter-organizational collaboration and lack of guidelines

Participants unanimously mentioned the active and daily involvement of their editorial colleagues and their organization’s legal department when dealing with privacy-related issues.

First, the collaboration between colleagues, often in the form of brainstorming sessions or quick meetings, serves to ensure that a fitting privacy measure is taken. Also, the apparent interchangeable and continuous conversations between colleagues and editors-in-chief contribute to assuring the legality of the open-source investigations that are conducted and the news items that they fuel. Often, participants prefer to keep the legal department tuned into their investigative processes to ensure that nothing can be held against them if trouble arises once their news items are published. About this, P2 stated:

If the identity of an individual surfaces inadvertently and it is not our fault but that of another news organization or person, we can still get accused of it. That can result in a legal hassle. It is very convenient to have discussed the entire process with the legal department so they are aware of all my steps and know that I have been cautious. Conversations with the legal department are hugely important, even if they are time-consuming.

OSINV journalists prefer that the legal department—when available—monitors their investigative processes to ensure the legality of their methods and, as indicated above, to rely on them when legal accusations might arise. As P6 commented: “I always clarify my information sources because otherwise, it is impossible for them [colleagues and legal departments] to trace back my steps – suppose someone drags you in front of the journalism council, you just want to be able to show where something came from.” P8 thinks that close relations with the legal department are important to not only legally protect yourself but also your colleagues. They stated:

If you did something illegal, didn’t communicate about it properly, and it surfaces, you put not only yourself but also your colleagues and maybe even the whole organization in a bad light. You have to take responsibility for yourself and also for the team around you.

Although open-source journalists are in close contact with their colleagues and the legal department, guidelines about privacy issues are often not available within their organizations. P1 stated that “within [my organization], we have never found the time to draw up guidelines since the open-source department was only established about a year ago.” P7 also blames time constraints for the absence of guidelines: “Open source is developing so quickly that news organizations haven’t found the time to reflect and create rules.” Some participants indicated they would prefer guidelines to adhere to. P4 said: “We don’t have any guidelines, and I think that’s crooked: I wish we did have rules. We do talk about privacy considerations with colleagues continuously, but we don’t have standardized rules, and I think we should have them.”

Apart from the absence of guidelines, digital open-source journalists indicated they are not obliged to receive training about how to deal with privacy-related assets of their work. Most participants have taught themselves how to conduct open-source investigations and turned to the occupation out of intrinsic motivation. Participants mention teaching and helping each other when dealing with privacy-related issues. Notably, reciprocal instructions are not limited to the protection of data subjects and brokers but also include tips and recommendations about the digital privacy situations of open-source journalists themselves. P3 explained:

We strongly advise [colleagues] to encrypt their hard drives and collected datasets. Honestly, I sometimes dread explaining how to do this because manual encryption techniques are not very user-friendly. However, encryption and data protection are extremely important and necessary, also when sharing files with colleagues. It can feel exaggerated at times, but it also protects you from legal implications if something goes wrong: Then, you can prove that you tried everything to prevent privacy mistakes.

Although inter-organizational tutoring is common, two participants are dissatisfied with some colleagues’ lack of awareness about how to protect the security of data brokers and subjects. P6 is one of them and admitted: “Honestly, sometimes I feel the editorial responsibility about the security of people just isn’t what it should be: often, insufficient thought is spent on it.”

Discussion

Existing research on digital open-source investigations and privacy issues emphasizes the power that investigators hold over the safety and privacy situations of the individuals who provide and feature in the content used for research (Pastor-Galindo et al., 2020; Saugmann, 2019). The results of this article show that digital open-source investigators are aware of the influence of their editorial choices on the safety and privacy of data subjects. Yet, despite open-source investigators’ unanimous awareness of their power position, common strategies to safeguard data subjects are yet to emerge. As per RQ1, for example, many of the digital open-source investigators interviewed take the actions of renowned international news organizations (like the BBC) as a reference point to base their privacy-protection choices. These investigators feel that their good intentions and careful measures are futile if the content has already been published by organizations with a broader reach than theirs. Others consider that even then, an independent and non-debatable duty to protect individuals’ safety and privacy remains.

Fiorella’s (2021) notion of ‘the spotlight effect’ seems to encapsulate a widely shared notion within the community. It was independently cited by an open-source investigator in connection to a case where the Russian military bombed a Ukrainian home from where the content was filmed and uploaded. It surfaced again when an open-source journalist explained how they typically refrain from exposing the Twitter accounts of unknown users due to their inability to foresee potentially harmful consequences. The appearance of ‘the spotlight effect’ in the results of this research indicates that open-source journalists are aware of the power they hold over the privacy and security of data subjects. That is, they realize that their reach and that of their organizations can cause unwanted and dangerous publicity to individuals who are involved in uploading certain content.

The privacy-protection measures that digital open-source journalists employ can roughly be divided into two categories: Firstly, the altering of audiovisual content and the removal of its metadata. Secondly, the encryption of collected and shared data. The altering of images and videos, for example, by blurring faces or removing watermarks, serves to prevent identification of the individual depicted in the media objects. Encrypting collected and shared data is done to keep it from falling into the hands of malicious parties.

Answering our second research question, the concept of gray information, as formulated by Hribar et al. (2014), was recognized by all open-source journalists who participated in this research. The fast rate and broad range of examples that open-source journalists provided in the interviews show that gray information plays a significant role in their investigative processes. Digital open-source journalists’ stances towards using gray information can be categorized into three groups: supportive of, undecided about, and opposed to.

Supporters of using gray information stress that, often, semi-legal or hidden information contains the crux of an investigative story and is, therefore, utterly necessary to use. Supporters believe that, generally, the importance of a gray information-based news item is greater than the legality of the means to establish it. Open-source journalists with an undecided stance towards using gray information are doubtful due to the frequent inability to verify such data and the fear of legal implications. Regardless, they indicate that often, they ultimately choose to use gray information after the reassurance of their colleagues, editors-in-chief, and legal departments. A minority of open-source journalists are opposed to using gray information because they believe it is unnecessary: according to them, there are always legal and open ways to generate the information they search for.

Results show that open-source investigators’ stances towards gray information are in a way connected to their “gut feeling,” which Schultz (2007) conceptualizes as a (journalistic) news-making process based on self-evident and self-explanatory judgments. Both supporters and opponents of the use of gray information base their attitudes on their personal, self-evident, and self-explanatory assessment of whether using such information is necessary. However, open-source journalists who are mostly undecided about using gray information base their ultimate decisions on whether the data is verifiable, as well as on the opinions of their colleagues, editors-in-chief, and legal departments.

Furthermore, the results of this research are in line with Belghith et al. (2022) finding that open-source newsrooms and organizations emphasize communication, transparency, and strong bonds between members. Open-source journalists are in daily, direct, and close contact with colleagues, editors-in-chief, and their organization’s legal department. Meetings serve to brainstorm about privacy measures, discuss legal issues, and exchange knowledge.

Open source investigators’ narratives reveal that their work relies on informal peer control and their own personal adherence to the “do not harm” and “public interest” principles rather than a professionally and widely agreed code of ethics. This grants them freedom in their investigative processes and privacy-related choices but also leads to dissatisfaction: Regarding our RQ3, open-source investigators indicate they would prefer having specific rules and guidelines to offer them guidance when making privacy-related decisions. The lack of guidelines leads some journalists to establish rules and models themselves. However, this does not happen in isolation. A key finding from the interviews is the close collaborations between open-source journalists and their organization’s legal departments. Open-source journalists are particularly concerned with the legality of their investigative processes and attempt to avoid potential legal repercussions. They prefer to keep the legal department tuned in to their investigative processes to rule out the possibility of receiving personal backlash or affecting the credibility of their organization.

Based on the findings of this paper, Saugmann’s (2019) concerns with the fate of civilians featured in the material used for open-source investigations need to be updated. Although OSINV practitioners lack a set of clear ethical guidelines commonly agreed upon as part of professionalization efforts—as is the case with traditional investigative journalism—they still implement well-thought-out and effective measures to avoid harming the producers of the content they rely on. The referrals to the implications of ‘the spotlight effect’ are indicative of an emerging ethical baseline. Something similar can be said about the references to intra-organizational consultation, of which a fundamental goal is to ensure that the most effective protection measures are taken collectively. In summary, a formal normative infrastructure that guides open-source investigators with privacy issues is still a work in progress. Therefore, while investigators still decide for themselves how to navigate privacy issues and where the boundaries of transparency and fairness lay, shared practices point to the possibility of agreeing upon a common baseline across the community of practice.

Conclusion

This paper examines how open-source investigators understand and mitigate privacy issues and balance them with their investigative goals. Given the importance of what we have called “gray zone information” for the community, we have also focused on that specific aspect of open-source investigations. Our findings indicate that OSINV practitioners navigate privacy issues through dialogues and brainstorming sessions with colleagues and through self-evident and self-explanatory individual reflection and assessment (RQ1). They balance privacy issues with their investigative goals by deciding whether journalistic interest is greater or less than the privacy and security needs of data subjects (RQ1). Furthermore, the results show that open-source investigators are aware of the power they hold over the security and privacy situations of the subjects of the data they use. However, the strategies that open-source investigators implement to protect people’s privacy and security differ. Additionally, open-source journalists’ stances towards the use of gray information vary from supportive of it, undecided towards it, and opposed to it (RQ2). These attitudes are dependent on open-source journalists’ personal assessments of the necessity of gray information and the opinions of their colleagues, editors-in-chief, and members of their organization’s legal department. Lastly, the newsroom or organization that open-source journalists work at plays a significant role in their privacy-related decisions and actions (RQ3). Contact between open-source journalists, editors-in-chief, and members of the legal department serves to brainstorm about privacy measures, discuss legal issues, and exchange knowledge. Results show that open-source newsrooms and organizations do not enforce privacy-related rules or frameworks on their journalists.

With one exception, the open-source journalists that were sampled for this article originate from and reside in the Netherlands. This fact, together with the relatively small number of interviews conducted for this research, stands in the way of any generalization. Additionally, bias caused by the snowball sampling method must be recognized – when individuals suggest other individuals from their network, there is no such thing as randomization. The sample is skewed in terms of gender. Future research can point out whether the limitations of this research distort the findings in any way.

Supplemental Material

Supplemental Material - The ethics of open source investigations: Navigating privacy challenges in a gray zone information landscape

Supplemental Material for The ethics of open source investigations: Navigating privacy challenges in a gray zone information landscape by Maartje van der Woude, Tomás Dodds and Guillén Torres in Journalism

Supplemental Material

Supplemental Material - The ethics of open source investigations: Navigating privacy challenges in a gray zone information landscape

Footnotes

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Tomás Dodds

Supplemental Material

Supplemental material for this article is available online.

Notes

Author biographies

Maartje van der Woude is a journalist with a master’s degree in journalism and new media from Leiden University. She holds a Bachelor’s in Sociology from the University of Amsterdam.

Tomás Dodds is an Assistant Professor in Journalism and New Media at Leiden University and a Faculty Associate at the Berkman Klein Center for Internet & Society at Harvard University. He is also a researcher in the AI, Media & Democracy Lab in the Netherlands and the Artificial Intelligence and Society Hub [IA+SIC] in Chile.

Guillén Torres is a lecturer and researcher at the Media Studies department of the University of Amsterdam. His main research interest is how data fosters human agency, with a focus on activism and open source investigations.

References

Agre

Rotenberg

(1998) Technology and Privacy: The New Landscape. Washington, DC: MIT Press.

Asian College of Journalism (Director) (2021) Open-Source Intelligence (OSINT) by Giancarlo Fiorella, Investigator and Trainer at Bellingcat. https://www.youtube.com/watch?v=AYKRE9WGSV4

Belghith

Venkatagiri

Luther

(2022) Compete, collaborate, investigate: exploring the social structures of open source intelligence investigations. In: Proceedings of the 2022 CHI conference on human factors in computing systems, New Orleans LA, 29 April–5 May 2022, pp. 1–18. Association for Computing Machinery. DOI: 10.1145/3491102.3517526.

Berglez

Gearing

(2018) The Panama and Paradise papers. The rise of a global fourth estate. International Journal of Communication 12: 4573–4592.

Berkeley Protocol on Digital Open Source Investigations (2022) Human rights investigations lab, 99. https://humanrights.berkeley.edu/projects/berkeley-protocol-on-digital-open-source-investigations/

Carson

(2021) The digital spotlight: applying a connective action framework of political protest to global watchdog reporting. The International Journal of Press/Politics 26(2): 362–384. DOI: 10.1177/1940161220912679.

Carson

Farhall

(2018) Understanding collaborative investigative journalism in a “post-truth” age. Journalism Studies 19(13): 1899–1911. DOI: 10.1080/1461670X.2018.1494515.

Cooper

Mutsvairo

(2021) Citizen journalism: is Bellingcat revolutionising conflict journalism? Insights on Peace and Conflict Reporting. London, UK: Routledge.

Di Salvo

(2022) Information security and journalism: mapping a nascent research field. Sociology Compass 16(3): e12961. DOI: 10.1111/soc4.12961.

10.

Dodds

(2021) Structures of Resistance: Citizen-generated Reporting in Times of Social Unrest. Politics of Disinformation. Hoboken, NJ. Doi: John Wiley & Sons, Ltd., 119–131. doi: 10.1002/9781119743347.ch9.

11.

Dodds

(2022) Newsroom Dissonance: How New Digital Technologies are Changing Professional Roles in Contemporary Newsrooms [Doctoral Thesis. Leiden University]. Available at: http://hdl.handle.net/1887/3270873.

12.

Dubberley

Ivens

(2022) Outlining a Human-Rights Based Approach to Digital Open Source Investigations: A Guide for Human Rights Organisations and Open Source Researchers (Human Rights, Big Data and Technology Project). Colchester, England: University of Essex.

13.

Dubberley

Koenig

Murray

(2020) Digital Witness: Using Open Source Information for Human Rights Investigation, Documentation, and Accountability. Oxford: Oxford University Press.

14.

Edwards

(2022) Open-Source Journalism in a Wired World. Nieman Reports. https://niemanreports.org/articles/open-source-journalism/

15.

Eijkman

Weggemans

(2012) Open source intelligence and privacy dilemmas: is it time to reassess state accountability? Security and Human Rights 23(4): 285–296.

16.

Gauthier

(2002) Privacy invasion by the news media: three ethical models. Journal of Mass Media Ethics 17(1): 20–34. DOI: 10.1207/S15327728JMME1701_03.

17.

Gubrium

Holstein

(eds) (2002) Handbook of Interview Research: Context and Method. Thousand Oaks, CA: Sage.

18.

Guest

Namey

Chen

(2020) A simple method to assess and report thematic saturation in qualitative research. PLoS One 15(5): e0232076. DOI: 10.1371/journal.pone.0232076.

19.

Guidelines for Public Interest OSINT Investigations (2023). ObSINT. https://obsint.eu/guidelines-for-public-interest-osint-investigations/

20.

Hammond

(2017) From computer-assisted to data-driven: journalism and big data. Journalism 18(4): 408–424. DOI: 10.1177/1464884915620205.

21.

Hansen

(2015) The Homo Sacer of open-source journalism. European Journal for the Philosophy of Communication (Empedocles) 6(1): 21–38. DOI: 10.1386/ejpc.6.1.21_1.

22.

Hribar

Podbregar

Ivanuša

(2014) OSINT: a “grey zone”. International Journal of Intelligence & Counter Intelligence 27(3): 529–549. DOI: 10.1080/08850607.2014.900295.

23.

Karlsson

(2010) Rituals of transparency. Journalism Studies 11(4): 535–545. DOI: 10.1080/14616701003638400.

24.

Kelty

(2008) Two Bits: The Cultural Significance of Free Software. Durham: Duke University Press.

25.

Lamdan

(2022) Data Cartels: The Companies that Control and Monopolize Our Information. Stanford, CA: Stanford University Press.

26.

Lewis

Usher

(2013) Open source and journalism: toward new frameworks for imagining news innovation. Media, Culture & Society 35(5): 602–619. DOI: 10.1177/0163443713485494.

27.

Manninen

VJE

(2017) Sourcing practices in online journalism: an ethnographic study of the formation of trust in and the use of journalistic sources. Journal of Media Practice 18(2–3): 212–228. DOI: 10.1080/14682753.2017.1375252.

28.

McGregor

Watkins

Caine

(2017) Would you slack that? The impact of security and privacy on cooperative newsroom work. Proceedings of the ACM on Human-Computer Interaction 1(CSCW): 1–22. DOI: 10.1145/3134710.

29.

Meijer

Conradie

Choenni

(2014) Reconciling contradictions of open data regarding transparency, privacy, security and trust. Journal of Theoretical and Applied Electronic Commerce Research 9(3): 32–44. DOI: 10.4067/S0718-18762014000300004.

30.

Mejias

Couldry

(2019) Datafication. Internet Policy Review 8(4). https://policyreview.info/concepts/datafication

31.

Methodology for Online Open Source Investigations Into Incidents Taking Place in Ukraine Since (2022). Justice and accountability unit - Bellingcat. https://www.bellingcat.com/app/uploads/2022/12/JA-Manual-for-PUBLICATION.pdf

32.

Müller

Wiik

(2023) From gatekeeper to gate-opener: open-source spaces in investigative journalism. Journalism Practice 17(2): 189–208. DOI: 10.1080/17512786.2021.1919543.

33.

Pastor-Galindo

Nespoli

Gómez Mármol

, et al. (2020) The not yet exploited goldmine of OSINT: opportunities, open challenges and future trends. IEEE Access 8: 10282–10304. DOI: 10.1109/ACCESS.2020.2965257.

34.

Perdomo

Rodrigues-Rouleau

(2022) Transparency as metajournalistic performance: the New York Times’ Caliphate podcast and new ways to claim journalistic authority. Journalism 23(11): 2311–2327. DOI: 10.1177/1464884921997312.

35.

Phillips

(2010) Transparency and the new ethics of journalism. Journalism Practice 4(3): 373–382. DOI: 10.1080/17512781003642972.

36.

Rambukkana

(2019) The politics of gray data: digital methods, intimate proximity, and research ethics for work on the “alt-right”. Qualitative Inquiry 25(3): 312–323. DOI: 10.1177/1077800418806601.

37.

Robinson

(2011) “Journalism as process”: the organizational implications of participatory online news. Journalism & Communication Monographs 13(3): 137–210. DOI: 10.1177/152263791101300302.

38.

Rogers

(2020) Deplatforming: following extreme Internet celebrities to Telegram and alternative social media. European Journal of Communication 35(3): 213–229. DOI: 10.1177/0267323120922066.

39.

Saugmann

(2019) The civilian’s visual security paradox: how open source intelligence practices create insecurity for civilians in warzones. Intelligence and National Security 34(3): 344–361. DOI: 10.1080/02684527.2018.1553700.

40.

Schultz

(2007) The journalistic gut feeling. Journalism Practice 1(2): 190–207. DOI: 10.1080/17512780701275507.

41.

van der Velden

(2021) Public OSINT: What is Open Source in Open Source Investigations? https://data-activism.net/2021/06/public-osint-new-working-paper-by-lonneke-van-der-velden/

42.

Vuyst

(2020) Hacking Gender and Technology in Journalism. New York, NY: Routledge.

43.

Whitehouse

(2010) Newsgathering and privacy: expanding ethics codes to reflect change in the digital media age. Journal of Mass Media Ethics 25(4): 310–327. DOI: 10.1080/08900523.2010.512827.

44.

Young

(1999) Giving it away: how red hat software stumbled across a new economic model and helped improve an industry. Journal of Electronic Publishing 4(3). DOI: 10.3998/3336451.0004.304.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.06 MB

0.08 MB