Abstract
This article argues that due to the difficulty in governing AI, it is essential to develop measures implemented early in the AI research process. The goal of dual use considerations is to create robust strategies that uphold AI’s integrity while protecting societal interests. The challenges of applying dual use frameworks to AI research are examined and dual use and dual use research of concern (DURC) are defined while highlighting the difficulties in balancing the technology’s benefits and risks. AI’s dual use potential is discussed, particularly in areas like NLP and LLMs, and the need for early consideration of dual use risks to ensure ethical and secure development is underscored. In the section on shared responsibilities in AI research and avenues for mitigation strategies the importance of early-stage risk assessments and ethical guidelines to mitigate misuse is emphasized, accentuating self-governance within scientific communities and structured measures like checklists and pre-registration to promote responsible research practices. The final section argues that research ethics committees play a crucial role in evaluating the dual use implications of AI technologies within the research pipeline. The need for tailored ethics review processes is articulated, drawing parallels with medical research ethics committees.
Keywords
Introduction
As an emerging technology, artificial intelligence (AI) has significant disruptive potential that has not yet fully developed. Its evolution has led to various applications, each posing different risks at varying levels of severity and on different timescales, depending on the technology and deployment context (EU, 2024; NAIAC, 2023; UK Government, 2024; UNESCO, 2022). As an emerging and general-purpose technology, AI has multifaceted applications that can yield both beneficial and harmful outcomes, making dual use considerations critical. Unlike well-regulated fields like medicine, AI research lacks established professional codes of conduct. This situation mirrors the unregulated innovation in the life sciences described by Bruno Latour in the 1990s, driven by market economics and an eagerness to explore all possible research paths (Latour, 1993). As AI artifacts (such as code, data, and models) proliferate in research and commercial endeavors, examining their rightful use and potential misuse becomes increasingly important. For instance, as large language models (LLMs) are integrated into a wide range of consumer products, dual use considerations become crucial for both research and deployment (Kaffee et al., 2023). The general goal of dual use considerations is to develop robust strategies that uphold AI's integrity while safeguarding societal interests.
The rapid evolution of AI research and product development necessitates and has intensified discussions on dual use concerns. Although there is growing literature on AI’s dual use characteristics, more attention has been focused on unintentional misuse, such as algorithmic bias and value alignment issues (Christian, 2020; Kirkpatrick, 2016; Zweig, 2022). During the past few years and with the advent of consumer products and a general AI hype, contributions focusing on dual use concerns regarding AI have been put forward, many focusing on natural language processing and large language models (e.g. Bernstein et al., 2021; Brundage et al., 2018; Gamage et al., 2021; Grinbaum and Adomaitis, 2024; Kaffee et al, 2023; Kania, 2018; Ratner, 2021; Shankar and Zare, 2022; Schmid et al., 2022; Urbina et al., 2022). A seminal report on AI threats underscores the dual use nature of AI and its research: “AI is a dual-use area of technology. AI systems and the knowledge of how to design them can be put toward both civilian and military uses, and more broadly, toward beneficial and harmful ends” (Brundage et al., 2018: 16). The authors note that AI advancements can augment existing threats and introduce novel risks, fundamentally reshaping the threat landscape (Brundage et al., 2018). As AI capabilities become more powerful and widespread, the use of AI systems will lower attack costs, expand the set of actors capable of carrying out attacks, increase the rate of attacks, and broaden the range of potential targets. New threats may also emerge when AI is used for tasks impractical for humans or when malicious actors exploit AI system vulnerabilities. These changes impact digital, physical, and political security, highlighting the comprehensive nature of AI-related risks.
AI impacts security in several domains, both offensively and defensively (Calderaro and Blumfelde, 2022). In cybersecurity, AI can discover and exploit vulnerabilities and patch them. In disinformation campaigns, AI can disseminate credible disinformation and detect and destroy undesired propaganda content (Whyte, 2020). AI can also be used in statecraft as a fiscal tool to counter illicit funds and in defense to improve logistics and predict maintenance needs (Carrozza et al., 2022). Autonomous systems in military applications can identify and attack targets without human interaction (Haner and Garcia, 2019). These examples show that the dual use dilemma encompasses both civil-military applications and high-risk-high-benefit considerations. Digital technologies, including AI, are often discussed in the context of military applications, with concerns about cyberspace skirmishes escalating into conventional warfare and the role of autonomous weapons (Christie et al., 2023; Sanger, 2023; Scharre, 2018; Taddeo and Floridi, 2018).
However, there is limited analysis of dual use aspects specific to generative AI research and LLMs (Grinbaum and Adomaitis, 2024). Lin (2016) argues that information technologies should not be regarded as dual use in the same way as physics, biology, and chemistry because they are general purpose, making meaningful governance challenging. Supporting Lin’s argument, until recently, there were almost no trade regulations for IT, apart from cryptography and intrusion software (Dullien et al., 2015; Pyetranker, 2015), illustrating the difficulties of governing software as a dual use good (Korzak, 2020; Riebe and Reuter, 2019; Vella, 2017). There is an ongoing debate on how to mitigate dual use risks in AI research that in part is rooted in discussions on aspects specific to AI research. The issue of dual use in AI research will be elaborated in the second section of the article, combining orthodox considerations from dual use research like the temporal aspect of the dual use pipeline with specific aspects of AI research that necessitate dual use considerations on the one hand, but complicate them too. Building on this analysis, sections 4 and 5 examine possible strategies to deal with dual use in AI research and argue for the importance of research ethics committees (RECs), as these bodies are organized from within the scientific sphere and can align epistemic claims with all kinds of implications that stem from dual use research. In the next sections, I will define dual use in the context of AI research, provide examples of dual use risks, and conclude with responsibility and mitigation strategies regarding dual use concerns in AI research.
Challenges regarding the application of dual use frameworks in AI research
This section is dedicated to defining dual use in AI research und to showing where difficulties remain when aiming at a definition. First, the concepts of dual use and dual use research of concern are briefly introduced, before AI research is characterized regarding the aspects of general applicability and ease of dissemination. Prominent definitions for dual use in AI research are also discussed.
Definitions of dual use in general
At the core of dual use considerations lie dilemmas that occur when preventing the harms from misuse might mean losing out on the benefits of a technology. Prohibiting dual use technologies is impractical due to their significant benefits, highlighting the core dual use dilemma of balancing advantages with substantial risks (Tucker, 2012). The respective dilemmas pose the challenge of finding a balance between the risks and rewards involved in developing, sharing, and controlling hazardous technologies. In general, the term “dual use” is used to describe an ambiguity of some forms and applications of technology or the research (possibly) leading to this kind of technology (Forge, 2010; Oltmann, 2015; Riebe, 2023). Liebert and Schmidt (2018: 54) define dual use as “a dichotomy of effects, impacts and opportunities that [. . .] occur simultaneously and that are intrinsically linked to a technology.” This broad definition encompasses various aspects that are discussed in the literature, where the term dual use is itself far from universally agreed upon and used in various contexts and meanings. Rath et al. (2014), for instance, identify five basic dichotomies in dual use concepts: (1) civilian versus military use, (2) benevolent versus malevolent purpose, (3) peaceful versus non-peaceful purpose, (4) legitimate versus illegitimate purpose, and the (5) good military and good civilian purpose.
Using the disparity between basic research and product development as a foundation, Forge (2010) differentiates between dual use knowledge, on one side, and the research outcomes, such as technologies and artifacts, on the other. Incorporating both aspects, he proposes the following definition for dual use: “An item (knowledge, technology, artefact) is dual use if there is a (sufficiently high) risk that it can be used to design or produce a weapon, or if there is a (sufficiently great) threat that it can be used in an improvised weapon, where in neither case is weapons development the intended or primary purpose” (Forge, 2010: 117). While the focus on weapons seems too narrow when discussing the dual use potential of AI research, the category of risk helps to adequately describe the need for balancing possible benefits and harms during research and development. And based on this concept of risk-realization, the term dual use can be applied to all the phases of research and development, from fundamental research to application development. For the context of publicly funded research and universities, considerations regarding dual use are especially fruitful when they focus particularly on the process of research and knowledge production (Drew and Mueller-Doblies, 2017; Resnik, 2009).
“Dual use dilemma” is the name for “the problem that the same material, technologies or knowledge might be used for wanted and unwanted purposes” (Rath et al., 2014: 770). The challenge is to specify where the realm of unwanted purposes begins and what measures can be used to strike a balance between the free development of research and design processes on the one side and security concerns on the other. Grinbaum and Adomaitis (2024) stress that one possible criterion for classifying a technology as dual use would be to require a demonstrable potential for large-scale harm through the malevolent or negligent use of this technology. This introduces the aspects of scale and risk, as opposed to a mere possibility of one-shot effect. The considerations of scope and risk make the notion of dual use more precise, and they point to specific frameworks of evaluating and addressing the risks of harm as well as responsibilities involved.
Dual use research of concern
Dual Use Research of Concern (DURC) involves research that, while intended for beneficial purposes, has the potential to be misused, posing significant threats to public health, agriculture, animals, the environment, or national security. In the context of dual use research, the dual use item is the research itself. This research can produce a variety of outcomes with dual use potential: knowledge, technologies used or developed during the research, and artifacts built with the help of this knowledge and technology (Forge, 2010). DURC encompasses experiments, techniques, or findings that could be repurposed to cause harm, including research on pathogens, toxins, and biotechnology. The most accepted definition of DURC comes from the Fink Report: “research that, based on current understanding, can be reasonably anticipated to provide knowledge, products, or technologies that could be directly misapplied by others to pose a threat to public health and safety, agriculture, plants, animals, the environment, or material” (National Research Council, 2004).
DURC presents a conflict between two paramount values in responsible research and innovation: the pursuit of knowledge and the safeguarding of public safety. At its core, DURC encapsulates a utilitarian dilemma, where a single scientific endeavor can simultaneously pose significant security threats and yield vital societal benefits. This dilemma is not exclusive to the fields of chemistry or biology; it extends to other research areas. Dual use concerns have been extensively explored across various fields, particularly in nuclear technology and the life sciences, due to their implications for safety and security (NSABB, 2007; Tucker, 2012). The interpretation of dual use varies with the technology and its associated risks. For example, nuclear technology is less accessible than biotechnology, which in turn is less accessible than IT (Riebe and Reuter, 2019). Nuclear dual use applications involve both civilian and military uses, controlled largely by nation-states. Conversely, life sciences technologies are more accessible and pose significant risks of misuse by terrorist groups or accidental release (Riebe and Reuter, 2019). To address these risks, concepts like DURC have been introduced by entities such as the US National Academies of Sciences, Engineering, and Medicine (2017) and the World Health Organization (WHO, 2022). Beyond life sciences and chemistry, dual use concerns also extend to technological development, where security issues focus on preventing malicious use and safety concerns involve secure handling of materials and information (Harris, 2016).
Dual use research produces knowledge, technologies, or artifacts that can be used in two ways. The first is the civilian and benevolent use intended by the researchers or those enabling the research (e.g. research institutions like universities). The second use contradicts this purpose, as it involves military or malicious applications, threatening physical, political, or digital security.
There is an important temporal aspect of DURC: ethical considerations in research hold varying significance within these domains, vividly illustrating Collingridge’s dilemma (Collingridge, 1980). This dilemma shares some similarities with dual use dilemmas, emphasizing the challenge of identifying potential harm stemming from research and development outcomes at the right time. In its early stages, when alterations are more feasible, predicting the application and consequences of technology is challenging. As it matures, adjusting becomes more costly: “When change is easy, the need for it cannot be foreseen; when the need for change is apparent, change has become expensive, difficult and time-consuming” (Collingridge, 1980). Interventions in basic research may seem straightforward but risk excessive regulation, poorly justified decisions, and the non-development of fruitful technologies. In contrast, restrictions in later stages, such as during product development or export control, can entail substantial financial losses along with administrative burdens (like oversight). As technology advances, it becomes increasingly realistic to evaluate its potential for harmful applications. Consequently, compared to phases of basic research, it is during product development that stricter dual use regulations are implemented, primarily centered around the products themselves (Alavi and Khamichonak, 2017; Wassenaar Arrangement Secretariat, 2023). This temporal dimension of DURC has been called the dual use pipeline (Koplin, 2023; Tucker, 2012). The process of materializing dual use risks involves various stages. There are some main steps, which could be broken down further to achieve a more fine-grained picture. Firstly, research with dual use potential must be initiated, executed, and eventually lead to a dual use discovery. Secondly, this discovery, in the form of dual use knowledge or technology, must reach malevolent actors. Thirdly, these actors then need to effectively employ the technology. As several steps are involved before malevolent actors can cause substantial harm using dual use technologies, there are several points where intervention can take place.
Aspects of AI research that is potentially dual use
AI research involves the study and development of algorithms, models, and systems that enable machines to perform tasks that typically require human intelligence, such as learning, reasoning, problem-solving, and understanding natural language. It spans various domains including machine learning, neural networks, computer vision, and robotics, aiming to create intelligent systems that can autonomously adapt and improve from experience. Natural Language Processing (NLP) and Large Language Models (LLMs) represent specific domains within AI that are based on deep learning techniques. These systems rely on training neural networks with extensive datasets, enabling related programs to analyze and generate natural language text.
AI research qualifies as DURC because it holds significant potential for both beneficial and harmful applications (Grinbaum and Adomaitis, 2024). While AI can revolutionize fields such as healthcare, security, and industry by enhancing efficiency, diagnosing diseases, and improving cybersecurity, it also poses substantial risks if misused (Brundage et al., 2018; UK Government, 2024). For instance, AI can be weaponized to develop autonomous systems for warfare, enhance cyberattacks, conduct invasive surveillance, and create sophisticated disinformation campaigns through deepfakes. These dual use potentials necessitate careful oversight and ethical considerations to ensure that AI research advancements do not inadvertently harm public safety, privacy, or national security. This dual potential underscores its classification as DURC, demanding a balanced approach to maximize benefits while mitigating risks.
Compared to other domains of DURC, AI research outcomes and technologies involve an extensive diversity in AI architectures and the complex behavior and emergent capacities inherent in deep learning systems. Nevertheless, they are described as particularly easy to disseminate and employ. Advanced AI research and technology have been characterized as multi-purpose and easy to diffuse (Hovy and Spruit, 2016; Zambetti et al., 2018). The case of LLMs is even more severe since replicating a foundation model is accessible to individuals and the smallest of organizations (Taori et al. 2023; Zhang et al., 2023) and they can now be created inexpensively by a growing number of researchers. LLMs, whether open source or developed by private companies or individual researchers, as exemplified by recent LLaMA-2 models, illustrate this ease of creation (Touvron et al., 2023). Unlike nuclear research, which requires political will at the level of countries, AI research presents a multi-stakeholder dilemma involving individuals, research labs, and businesses. This increased accessibility amplifies the potential for misuse, necessitating the regulation of foundation models (Bommasani et al., 2022) due to their potential to spread misinformation, manipulate, influence, perpetrate scams, or generate toxic language. Specific social risks have been identified by LLM developers (Weidinger et al., 2021). Some of these risks can be emergent, arising unintentionally from training, while others stem from poor design, such as insufficient mitigation against bias or inadequate elimination of toxic outputs. Additionally, output toxicity may be highly context-dependent; for instance, generating a phrase calling someone a dog might be offensive or not, depending on the circumstances (Grinbaum and Adomaitis, 2024). This context dependency is intrinsic to several types of harmful language, including insults and medical advice. Thus, LLMs present risks that are highly contextual. An even more concerning aspect of LLMs lies with the ‘unknown unknowns,’ or potential misuses not foreseen by current foresight and risk evaluation benchmarks (Tamkin et al., 2021). While recent efforts to mitigate LLM-associated risks have focused on specific types of harm, more harmful behavior may emerge in future LLM use. Consequently, Grinbaum and Adomaitis suggest that powerful generative AI research, including on LLMs, should be classified under DURC.
For machine learning, Kempner et al. (2011) point out that it is infeasible to halt the development of machine learning in directions that could lead to negative outcomes. The reason for this is that machine learning “is in most cases a general purpose or dual use technology, meaning that it has general capabilities, which are applicable to countless varying purposes” (Kempner et al., 2011). Also, LLMs are considered a “generalist technology” (Vannuccini and Prytkova, 2021) and a “crossover technology” (Trusilo and Danks, 2024), capable of generating a wide range of outputs across various contexts of application. This generalization, combined with their autonomy, creates risks that are less predictable and more pervasive compared to traditional digital systems. With LLMs there is another risk: their ability to mimic human-like behavior and cognition poses unique risks for impersonation and individual exploitation. Moreover, as machine learning technologies and LLMs play increasingly important roles in decision-making processes in logistics, healthcare, and finance, the pressure on their reliability mounts. If they don’t work the way intended as the primary use, they may get misused for malicious purposes und domains where they can cause significant harm. Assessing AI research is particularly challenging due to its oscillation between basic and applied research. As strict separation between these stages is hardly possible, dual use considerations must occur early in the dual use pipeline. This oscillation complicates defining the dual use-specific aspects of AI research.
Discussing dual use aspects of AI, it is worth mentioning its historical ties to warfare and security, dating back to Turing’s pioneering work, “Can Machines Think?” and the UK Government’s efforts during World War II to decode the German cryptographic language generated by the Enigma machine. Since then, AI advancements have been marked by access to vast pools of big data, improvements in machine learning methodologies, and significant developments in computer processing capabilities (Calderaro and Blumfelde, 2022). Considering the military-civilian distinction, advanced AI research and technology have been relevant for the defense sector's automation efforts (Taddeo et al., 2021). In conflict scenarios, AI technologies further obscure the distinction between civilian and military applications. Present military operations use various AI-assisted tools, from facial recognition systems identifying potential adversaries and casualties in war zones to sensor and navigation aids helping in target selection. Computer vision technology, enabling AI to interpret visual data, is utilized to analyze surveillance drone footage (Marijan and Pullen, 2023). This trend shows the diffusion and adaptation of AI technology, initially designed for civilian use, into military contexts (Liebert, 2013). However, some studies have not observed significant knowledge spillover between civilian and military-industrial sectors (Riebe, 2023; Schmid et al., 2022). Nonetheless, the continuous development of autonomy and automation of military tasks may lead to the incremental development of Lethal Autonomous Weapons (LAWS) rather than a single giant leap (Riebe, 2023). Verbruggen (2019) supports this claim, arguing that there “will be no watershed moment when systems unambiguously definable as LAWS will suddenly be deployed.” Since military AI applications are achieved partly through innovations in non-military research institutions, evaluating dual use concerns early in the dual use pipeline is necessary.
AI Systems are subject to novel security vulnerabilities that need to be considered alongside standard cyber security threats (NCSC, 2023). Strategies like prompt injection attacks or data poisoning are directed toward deep learning systems, deceive the models in a way that is hard to detect and by way of the misclassifications cause the systems to underperform or even fail (Ambrus, 2020). Security issues like these are related to dual use concerns as AI systems are being more and more integral parts of socio-technical systems where they play crucial roles in decision making. So just like other pieces of infrastructure, especially in the case of critical infrastructure, the security of these systems is intertwined with their possibility for misuse. For an exhaustive overview of security threats to AI systems, see the knowledge base of adversary tactics and techniques named ATLAS—Adversarial Threat Landscape for Artificial-Intelligence Systems—provided by MITRE (2023). This overview and other resources can help to keep track of AI-specific threats and assess the impact they might have on the research conducted (Hendrycks et al., 2023). Besides, there is a legion of guidelines that offer help in developing, deploying, or operating AI-systems in secure ways (BSI, 2023; ENISA, 2023; NCSC, 2022) and for AI risk-management (NIST, 2023; TTC, 2022).
The aspects mentioned in this section, stemming from the inherently digital nature of AI, increase the dual use potential of AI research, allowing tools created for lawful purposes to be repurposed for criminal activities, systems designed for beneficial aims misused for harmful ones. As AI advances and its applications become more widespread, the risk of malicious or criminal exploitation escalates.
Definitions of dual use concerning AI research
Concerns regarding dual use of AI have been discussed by prior work. Allen and Chan (2017) surveyed technologies with prominent concerns around dual use and explored case studies including chemicals, biological engineering, cryptography, and nuclear technologies for potential insights for AI dual use policymaking.
In a rather general way, Koplin (2023: 32) states that like other dual use technologies, “AI language models could be used for both beneficial or malevolent purposes” and that “compared to the life sciences, the dialogue around dual-use AI research is still at an early stage.” But there is nevertheless an already big and steadily growing body of literature that deals with potential forms of misuse of AI research and technology and that therefore can full under the umbrella of dual use considerations (e.g. Bernstein et al., 2021; Gamage et al., 2021; Grinbaum and Adomaitis, 2024; Kaffee et al, 2023; Kania, 2018; Ratner, 2021; Schmid et al., 2022; Shankar and Zare, 2022; Urbina et al., 2022). Brundage et al. (2018) define malicious use of AI loosely, “to include all practices that are intended to compromise the security of individuals, groups, or a society.” Focusing on research, Grinbaum and Adomaitis (2024) for instance argue for the benefits of applying a DURC framework to generative AI. They stress that an application of the framework could heighten political awareness of the significant role generative AI plays as a societal transformation force. According to their approach, LLMs and generative AI research can be considered as DURC. But this categorization, however, does not imply that LLM research should be prohibited or that the benefits of the technology should not be exploited: as is the case with established avenues of DURC, one main focus is on raising awareness of risks and to provide guidelines on how to encourage safety when applying AI research.
For the provision of guidance on the side of researchers, Kaffee et al. (2023) developed a checklist that can help in decision-making when dual use concerns arise about NLP research projects. The definition they use for dual use is in line with the considerations presented so far and can be seen as state of the art regarding dual use considerations in in NLP research and in AI research in general. They conceptualize “dual use as the malicious reuse of technical and research artefacts that were developed without harmful intent. Malicious reuse signifies applications that are used to harm any, and particularly marginalized groups in society, where harm describes the perceived negative impacts or consequences for members by those groups” (Kaffee et al., 2023). This is in line with the general considerations on dual use and DURC presented so far and can be seen as the state of the art regarding the ethical reflections on dual use in NLP research and AI research more generally. The key element is that there is a secondary use that is harmful and differs from the intended primary use that was guiding the research when it was conducted by an individual researcher or a group of researchers. It is worth mentioning that the definitions focus on misuse and subsequent harm encompasses both dimensions of dual use discussed, namely the civilian-military distinction and the benevolent-malicious one. In the next section, I will briefly mention some examples of these kinds of misuse of AI research to illustrate the definition.
Examples of dual use risks in AI research
In this section, I will briefly point at some of the dual use risks and the misuse cases that are being debated regarding the field of AI research in general and with a specific focus on generative AI. An established analysis provides 21 types of harm that can stem from Language Models (Weidinger et al., 2021). Based on a survey with professionals, Kaffee et al. (2023) found that the participants of the study perceive as potential harm issues like manipulation, oppression, surveillance, crime, ethics washing, cyber bullying, censorship, and plagiarism. The overview I provide in this section will give a good impression of what AI systems are being discussed regarding dual use and which cases of misuse actually occur or are seen as emerging threats for the near future. It is not intended as a ranking of the most dangerous misuse cases but supposed to provide an impression of the downstream risk landscape resulting from AI research and development.
As discussed in the introduction and the section on defining dual use, AI technology can be integrated into autonomous weapons systems, allowing for lethal action without direct human intervention, raising ethical and accountability issues in warfare (Christie et al., 2023; Sanger, 2023; Scharre, 2018; Taddeo and Floridi, 2018). Facial recognition technology can be misused for mass surveillance by governments or private entities, leading to extensive privacy violations and the monitoring of individuals without their consent (EU, 2024; Smith and Miller, 2021). Moreover, AI systems can be employed by authoritarian regimes to identify and track political dissidents, enabling targeted repression and human rights abuses (Ambrus, 2020; Carrozza et al., 2022; Feldstein, 2019; Saheb, 2023). This potential for oppressive monitoring or political control contributes to AI being a dual use technology.
AI can assist in the development or dissemination of biological and chemical agents, increasing the risk of large-scale disasters through enhanced precision and effectiveness (Shankar and Zare, 2022). Urbina et al. (2022) demonstrated that the MegaSyn model, initially designed for drug development, could be repurposed to identify potentially lethal molecules, hinting at the potential for biochemical weapon development by altering a single algorithmic parameter. Meanwhile, Boiko et al. (2023) showcased an AI system autonomously conducting experiments and synthesizing chemicals in real laboratories. And moreover, Soice et al. (2023) have highlighted how easy access to information can be extremely helpful in identifying dangerous viruses and sourcing necessary raw materials. This issue has been explored in the context of ‘forbidden knowledge’ (Hagendorff, 2019) and ‘information hazard’ (Bostrom, 2011). Adding to the problem of information hazard, the advent of cloud labs or contract research organizations means that not just specialized knowledge is not necessary, but that no specific expertise is required to manufacture hazardous substances (Arnold, 2022; Lentzos and Invernizzi, 2019).
Intentional manipulation of users, for example, by dis- or misinformation and attempted polarization is a potential misuse of AI technology, the use of language models for such tasks has been discussed as Automate Influence Operations (Goldstein et al., 2023). LLMs for instance can be used for malicious activities like generating highly persuasive disinformation, in part through creating deepfakes (Gamage et al., 2021). The rapid evolution of AI technology has drastically outpaced the development of countermeasures, such as content verification tools, watermarks, or fact-checking algorithms (Clark et al., 2021; Grinbaum and Adomaitis, 2022; Heikkilä, 2022). It is increasingly challenging to distinguish between genuine and artificial content, rendering existing content moderation and recommendation systems more and more ineffective. See for example the case of ‘spin’ attacks described by Bagdasaryan and Shmatikov (2022). As Koplin (2023) shows, one extensively debated concern is the potential use of AI text generators to produce ‘synthetic’ fake news. To date, fake news production involves human efforts and significant resources. AI text generators have the capacity to automate this process, empowering malicious users to create deceptive articles with rather low efforts. These articles could bolster particular perspectives, undermine political systems, and promote or denounce specific products, individuals, or organizations. For instance, they could be harnessed to efficiently produce social media posts promoting specific political ideologies (“astroturfing”) or inundate platforms with either positive or negative reviews for a company’s offerings. Koplin concludes that the unique aspect of AI text generators in terms of dual use pertains to their ability to generate significant volumes of deceptive or misleading content, pointing out that this capability appears consistent across diverse forms of generative AI, encompassing the creation of video and audio content as well as code. When considering large scale applications of this capability the concept of cognitive warfare is used. Cognitive warfare, stemming from prior forms like PsyOps and information warfare, heavily leverages new communication technologies, particularly AI (Miller, 2023), and targets entire populations, aiming to change behavior by altering thought patterns rather than simply providing false information on specific issues. Utilizing sophisticated psychological techniques, its goal is to destabilize institutions, notably governments, often by first destabilizing epistemic institutions such as news media and universities. This approach heavily relies on computational propaganda, where generative AI tools enable the deployment of profile-based, individually targeted communication.
It is widely discussed that AI technology can be misused to automate cyberattacks, including phishing, mass-scale social engineering, and producing malicious code (David and Paul, 2023; Gregory, 2022; Ropek, 2023). By generating convincing content tailored to specific targets, LLMs make it easier for malicious actors to weaponize language (EUROPOL, 2023; Trend Micro Research, 2020). LLMs already have the potential to revolutionize spearphishing and other types of attacks due to drastic reductions in cost and time (Hazell 2023). This availability drastically lowers the barriers to entry, and thus increases the range of actors that can engage in malicious uses.
AI systems can serve as tools to familiarize individuals with diverse potential crime areas, ranging from home intrusion techniques to terrorism, cybercrime, and instances of child sexual abuse, without prior expertise in these subjects (EUROPOL, 2023). King et al. (2019) show how AI can be misused for distinct areas of crime like commerce and financial markets, harmful and dangerous drugs, offences against persons, sexual offences, and theft and fraud. For instance, AI tools can be used to generate, distribute, and even enhance child pornography, worsening exploitation and abuse of children by making illicit content more accessible and harder to detect (Danaher, 2017; Ratner, 2021; Schreiber, 2022).
Shared responsibilities in AI research and avenues for mitigation strategies
The discussion of the dual use definitions and the temporal aspect of DURC in the second section and the examples of potential misuses of AI research in the third section have shown the need to consider dual use as an important aspect in the research and design of AI. Considering both aspects debated in discourse on dual use, namely military applications and harmful misuses, evaluating AI research presents a significant challenge (Bernstein et al., 2021; Heinrichs, 2022; ZEVEDI, 2023). Allocating responsibilities between researchers, developers, and deployers seems particularly important given the fact that general purpose AI may be advanced by researchers and released by developers with no intentions for a particular use, but then put to a wide range of downstream uses by deployers. This instantiation of Collingsridge’s dilemma shows that risk assessment for the early stages becomes more important and more challenging with the shift toward general purpose foundation models, where downstream use cases are wide-ranging and uncertain at the time of development (NAIAC, 2023). Within the sphere of research, distinct stakeholders emerge: universities and public research institutions primarily engage in fundamental research, while industrial collaborations and military-funded research facilities lean toward product development. The considerations around dual use thus span from fundamental research domains to export limitations on fully realized products.
The dual use pipeline in AI research highlights the continuous and iterative process through which AI technologies are developed, evaluated, and potentially misused (Koplin, 2023). This pipeline starts with fundamental research, where new algorithms and models are created and tested in controlled environments. As these technologies mature, they move into applied research and development, where they are integrated into practical applications and deployed across various sectors. Throughout this process, the dual use nature of AI becomes evident, as innovations intended for beneficial purposes can be repurposed for malicious activities. The dual use pipeline thus underscores the need for robust ethical guidelines, vigilant oversight, and proactive risk management strategies to ensure that the advancements in AI research are harnessed for positive outcomes while minimizing the potential for misuse.
What are measures and intervention points to assess and address dual use concerns in AI research? As a starting point for answers to these questions, Harris (2016) outlines three main objectives for dual use governance: (1) preventing the development of technologies for malicious purposes, (2) controlling access to dual use materials and information, and (3) ensuring the safe handling of these elements. Effective governance toward these objectives involves varied measures at different stages of research and development. Tucker (2012) argues that assessing the safety and security risks of emerging dual use technologies requires a flexible, iterative process that integrates new information as it becomes available. He suggests that incorporating this iterative technology assessment into the research and development cycle allows for the identification and mitigation of risks. Once these risks are identified, a tailored governance package comprising hard-law, soft-law, and informal elements is necessary to balance risks and benefits equitably among stakeholders. King et al. (2019) emphasize that risks are most effectively addressed during the research and design phases. Informal methods like self-governance within scientific communities, enforced by academic associations, societies, and RECs are crucial for managing AI dual use concerns in universities and publicly financed institutions. These bodies are main contributors for achieving compliance with ethical standards and promoting responsible conduct in AI research, thus highlighting the importance of scientific self-governance in mitigating dual use risks.
In the AI development pipeline, stakeholders include research and innovation actors such as individual researchers and engineering teams, industrial actors such as manufacturers or deployers of AI systems, ecosystem members such as open-source platforms and independent model evaluation groups, governance bodies including regulators and standardization agencies, and final users. To arrive at a working notion of responsibility, criteria need to be established for how it should be shared among the different actors in the value chain, for example, the programmers who design a foundation model and those who design control layers, the trainers who select training data, the manufacturer of the AI system and that of possible plugins, an intermediary entity using the API supplied by the manufacturer, and the final user (Grinbaum et al., 2017).
Many argue that because it is unrealistic to restrict export or distribution of digital products an approach to tackle dual use concerns at the end of the dual use pipeline seems inappropriate (Bernstein et al., 2021; Grinbaum and Adomaitis, 2024; Kaffee et al., 2023; Riebe, 2023). Shehadeh (1999) has described how export control measures like the Wassenaar Arrangement failed in regulating cryptographic algorithms and network security tools. Considering some of the similarities between cryptography and AI (both are running on general purpose hardware, are immaterial objects in the form of algorithms, have very wide ranges of legitimate applications, and are dual use as they have the ability to protect as well as harm), it is likely that export control measures will not suffice to deal with the dual use potential of AI (Brundage et al., 2018). While hard-law measures like reporting mandates and export controls are effective for dual use goods, AI research benefits more from soft-law approaches such as transparency initiatives, risk education, and global standards. Therefore, measures must be taken at the early stages, i.e. during the research phase. Until structured efforts to prevent the misuse of research are implemented throughout the research sector the responsibility somehow seems to remain with the professional researchers themselves (Kaffee et al., 2023). Or at least they are the strongest barrier to prevent the misuse of research—mainly by not pursuing research that they deem possibly harmful due to dual use considerations. Dual use checklists have been published by researchers (Kaffee et al., 2023) and university councils. The Flemish Interuniversity Council released
In bioethics, there is a debate on whether a well-developed system of self-regulation could strike a fruitful balance between respecting scientific openness and protecting society from harm. Some argue that it will work provided that scientists engage with the system in good faith (Resnik, 2010); others oppose it, saying that it might be an over-estimation of the scientists’ competences in assessing security risks (Selgelid, 2007). Regardless, there is an important disadvantage of processes such as checklists (Madaio et al., 2020), volunteer drop-ins, and product reviews (Holstein et al., 2019) when it comes to dual use: they all rely on voluntary usage, so while they are valuable, they are limited to those who self-select (Bernstein et al., 2021). Notably, in the survey already mentioned that Kaffee et al. (2023) conducted with NLP professionals, the participants stated that while dual use is perceived as a somewhat or very important issue to their research, only a minority thinks about it often or always and about a third of the participants stated that they do not take any steps to prevent the misuse of their work. Requirements by academic forums to add ethics content to research paper submissions are worthwhile but are only enforced at the end of the research process. In contrast, as I have shown in accordance with the considerations on the dual use pipeline, ethics and societal reflections are best considered at the
In any case, the need to consider dual use challenges extends beyond individual researchers; institutions, professional associations, and governments also play significant roles in influencing the likelihood of dual use discoveries and their potential misuse. Each party shares moral responsibility for any resulting harm, meaning that focusing solely on individual researchers is insufficient when dual use technology is misused. Addressing these shared responsibilities effectively requires a “web of prevention”: a comprehensive set of measures implemented by diverse stakeholders (Miller, 2018; Selgelid, 2013). This framework aims to minimize dual use risks throughout the entire research and development pipeline. Publicly funded research institutions, in particular, must not only share this responsibility but also establish structures and environments that promote ethical research while preventing harmful outcomes.
The scientific community and the domain of publicly funded research have developed measures to address evaluation of potential dual use problems arising from AI research. For LLMs, it includes a variety of testing techniques like automatically computed benchmarks on standardized datasets, penetration testing, human ‘red teaming’, etc. (Grinbaum and Adomaitis, 2024). A DURC framework should begin with testing and evaluation requirements; the results of such testing should be published alongside the model. But these efforts are only getting institutionalized step by step. One approach is the pre-registration of machine learning research, as that can address AI dual use concerns by promoting transparency and accountability in the research process (Albanie et al., 2021; Bertinetto et al., 2020; van Miltenburg et al., 2021). By requiring researchers to publicly outline their research objectives, methods, and intended outcomes before conducting their studies, pre-registration helps to ensure that projects are aligned with ethical standards and societal benefits. This practice deters malicious or unethical research by making the research intentions and processes open to scrutiny by the scientific community and regulatory bodies. Additionally, pre-registration may facilitate reproducibility and integrity in AI research, enabling better oversight and subsequently fostering trust in the development and application of AI technologies. A second approach is the already-mentioned red teaming (Ganguli et al., 2022; Ji, 2023; Perez et al., 2022), which also aims at addressing AI dual use concerns by proactively identifying and mitigating potential risks. In this approach, a group of experts, acting as adversaries, rigorously tests and challenges AI systems to uncover vulnerabilities and misuse scenarios. This process helps in anticipating malicious applications, enhancing system robustness, and ensuring that security measures are in place to prevent misuse. By simulating real-world threats, red teaming provides valuable insights into the dual use implications of AI technologies, allowing developers to implement safeguards and ethical guidelines that mitigate risks. Red teaming is an approach that could get professionalized in the future and then outsourced to oversight bodies (Rehse et al., 2024).
DURC largely relies on policy levers directed toward government-funded research. These levers are, however, not in place for many AI research surroundings (Bernstein et al., 2021). AI developers are typically funded independently, often through private corporations, and do not rely heavily on government support. The usual funding-related incentives and policy measures are not applicable to major AI developers. Furthermore, the global and diffuse nature of AI development means that regulatory efforts in one jurisdiction might not prevent misuse in another, not to mention the potential for clandestine development that evade law enforcement. On the whole, the implementation of DURC meets here new challenges that indicate a lack of “teeth” in any governance framework due to low entrance barriers and highly international character of AI in general, and generative AI in particular (Grinbaum and Adomaitis, 2024).
When it is acknowledged that scientific research and ambitious technological innovation are not going to be put to a halt, considering AI research as DURC could heighten political awareness of its significant role as a societal transformation force and highlight some of the possible misuse cases. If likewise, the necessity to address dual use concerns early during research is accepted and the reliance on the good will of individual researchers is taken to be too weak, then RECs might be a good option to evaluate dual use concerns.
The challenge of dual use AI for RECs
Assessing the multifaceted capabilities of AI within complex socio-technical systems presents a considerable challenge. In this assessment of the dual use pipeline, RECs and the scientific community hold pivotal roles. Considering that research on AI is rapidly evolving, the same must be true for impact assessments and evaluations by RECs. In today’s university environment, there are few institutional structures to facilitate computing and AI researchers in addressing issues of societal and ethical harm (Kaffee et al., 2023). Unlike other professions such as law and medicine, computing lacks widely-applied professional ethical and societal review processes and no fiduciary relationship (Bernstein et al., 2021; Grinbaum and Adomaitis, 2024). Against the backdrop of the many harmful applications and avenues of potential misuse to be considered in research on AI, it becomes clear that there is an urgent need for information and consolidation on possible dual use implications. This is even more the case for RECs who face the challenge of reviewing research proposals and only partly incorporate the expertise to do so in a fully informed manner. But trust in science and technology as well as the utility and acceptability of their most advanced outcomes depends on the ethical qualities of the research. This is the reason why research projects are submitted to an ethics review.
To date, the concepts of accountability and responsibility are somewhat ambiguous within the realm of AI research (Heinrichs, 2022; Novelli et al., 2023). Therefore, to act responsibly in research and during the review process, the relevant stakeholders—namely researchers and RECs—must be equipped to perform effective self-assessments and evaluations. Considering the nascent state of reflection and standardization common in AI evaluation processes (Raji et al., 2022a), more exchange between evaluation bodies and especially RECs would be helpful. By this, assessment outcomes could do more than set internal standards for quality assurance and ethics compliance. They could prompt further reflection for the whole research community. Collective decision-making on dual use issues can raise awareness within the public research sector for the importance of handling dual use challenges. Part of this should be a shared understanding and consistent application of concepts and terminology that include socio-technical characteristics of AI systems as well as risk, risk perception, risk tolerance, and risk management. Establishing a mutual comprehension of current risks can serve as a foundational framework for detecting and evaluating both existing and emerging risks, whether in the domain of dual use or otherwise.
One option lies in exploring the chances for developing an REC model specifically tailored for high-risk AI research could prove valuable. AI ethics committees could mirror the operational structure of established medical RECs (Véliz, 2019). Like established RECs, these bodies could encompass a comprehensive assessment, integrating considerations of dual use risks alongside other ethics concerns. Relying solely on RECs for this task would impose an excessive load on their side and isn’t particularly practical, especially considering that in many cases (and depending on the jurisdiction), researchers aren’t obligated to have their projects reviewed. However, this is different in the case of research funding. More and more often, funding institutions or publishing bodies require at least a self-assessment from researchers or an opinion from an ethics committee. This increases the workload for these committees and the complexity of the applications, particularly when long-term projects researching general purpose technologies are to be assessed for their ethical implications and potential dual use risks.
An example of an interdisciplinary ethics committee at a local university is described by Bernstein et al. (2021). They set up a review board that interacts with researchers pursuing AI research and aims at discovering potential harms, where these harms are understood in a wide sense, including harm to society, what makes this approach align well with the concept of DURC. To complement traditional Institutional Review Boards, a focus on society rather than merely individuals an Ethics and Society Review board has been developed and installed, serving as a feedback panel that works with researchers to mitigate negative ethical and societal aspects of AI research (Bernstein et al., 2021). But as the members of the ESR admit, the process works only as it is a requirement for funding: researchers cannot receive grant funding from a major AI funding program at the university until the researchers complete the ESR process for the proposal. Researchers are asked to submit a brief ESR statement alongside their grant proposal that describes their project’s most salient risks to society, to subgroups in society, and globally. This statement is supposed to articulate the principles the researchers will use to mitigate those risks, and to describe how those principles are instantiated in the research design. The next steps aim at working with the researchers to identify negative impacts and devise reasonable mitigation strategies, the iterative feedback can include raising new possible risks, helping identify collaborators or stakeholders, conversations, and brainstorming. Overall, the goal of the ESR process is described to find a lever that can inject social and ethical reflection early. Bernstein et al. (2021) convincingly argue for the importance of institutionalized processes that support and encourage individual researchers to apply abstract and general principles to their own projects, as long as legal regulations are missing. The concept of scaffolding, manifesting in the iterative process and a kind of directing guidance, plays a significant role in this. Firstly, it provides orientation in an unfamiliar field. Secondly, it offers predetermined slots for reflection, ensuring that considerations stay focused. Feedback from experts in research ethics also helps in this regard, especially since high-level principles can be challenging to enact (Holstein et al., 2019). This approach shows how a well-established self-regulatory system could effectively balance scientific openness and societal protection from harm, assuming scientists actively participate in the system with genuine intent (Resnik, 2010). As this kind of evaluation is far from universally established and only works in the specific case due to the combination with a funding program, it’s crucial to address the current lack of comprehensive guidance available to AI researchers dealing with dual use issues within their respective research institutions.
Guidelines and checklists like the ones mentioned (Kaffee et al., 2023; VLIR, 2022) can be very helpful for individual researchers and exchange within the scientific community, for instance when topics from the checklists are integrated into publications. Directed at RECs, the Centre Responsible Digitality’s directives are tailored for decisions regarding dual use in research proposals. These
A notable example of an independent and nonprofit organization assessing the risks of AI development is METR, which conducts “Model Evaluation and Threat Research” (2024a). They have developed a set of responsible scaling policies specifying the level of AI capabilities an AI developer can safely handle with current protective measures, and the conditions under which it would be too dangerous to continue deploying AI systems and/or scaling up AI capabilities until protective measures have improved (METR, 2024b). Although METR collaborates with numerous AI developers like OpenAI, it faces the same issue as oversight by RECs: there is currently no requirement to participate in an assessment, and there are no means to enforce measures. As METR (2024a) states: “We think it’s important for there to be third-party evaluators with formal arrangements and access commitments—both for evaluating new frontier models before they are scaled up or deployed, and for conducting research to improve evaluations. We do not yet have such arrangements, but we are excited about taking more steps in this direction.”
Besides further contributions to the theory of research ethics focusing on dual use aspects of AI research, it is initiatives like those that are pivotal in consolidating concerns around AI dual use research. Active involvement by researchers from computer science, ethics, and other disciplines, members of ethics committees, experts on dual use, associates of science management boards, and the wider scientific community is crucial for fostering a collective comprehension of the challenges posed by AI research. This will also facilitate the exchange of ideas regarding the role of RECs in addressing these issues.
The composition of RECs and their integration into the multifaceted research environment remain unclear and continue to be topics of interest not only in research ethics but also in broader discussions about higher education and research policies (Brenneis et al., 2024). Several options are emerging for the establishment of local RECs. One approach is to form committees that focus on the technical aspects of potential AI dual use research, incorporating domain-specific expertise as necessary (Raji et al., 2022b). Another approach is to create inherently interdisciplinary committees that encompass all relevant areas of expertise. A third option is to have research ethics experts serve as the primary point of contact, organizing tailored evaluations and support for each research proposal by pooling expertise from relevant fields. This latter model could also be implemented at a level above local universities, leveraging a network of specialized expertise similar to existing measures that ensure scientific quality, such as double-blind reviews.
Conclusion
Researchers and RECs alike face significant challenges when assessing AI research with dual use potential. One primary difficulty is the complexity and specialized nature of AI technologies, which often require specific technical expertise that committee members may lack. This gap in knowledge can hinder their ability to fully understand and evaluate the potential dual use implications of AI research. Additionally, the rapid pace of technological advancement in AI outstrips the development of ethical guidelines and regulatory frameworks, necessitating continuous updates to the committee’s knowledge base and assessment criteria. The ambiguous nature of dual use potential further complicates this task, as predicting possible misuse requires significant foresight and imaginative thinking. Balancing the potential benefits of AI research against the risks of misuse presents another challenge. RECs must make nuanced judgments about the likelihood and severity of potential harms versus the prospective gains from the research, and so in the context of insufficient or evolving ethical and legal frameworks. Ensuring transparency and accountability in AI research while protecting proprietary information and intellectual property rights adds to the complexity. Interdisciplinary collaboration is essential yet challenging to coordinate, requiring input from fields such as computer science, ethics, law, and security. Moreover, the global implications of AI research necessitate that RECs consider international perspectives and potential cross-border impacts, adding another layer of complexity to their evaluations. Addressing these challenges requires ongoing education, robust frameworks, and effective interdisciplinary collaboration to guide ethical assessments of AI research with dual use potential.
The article showed that there are strong arguments provided by many researchers that research on and with AI in general and with LLMs and generative AI in particular is prone to dual use. Considerations on the dual use pipeline in combination with Collingridge’s dilemma underscored that the early stages of research are decisive for evaluating dual use risks in AI research, especially because general purpose technologies that are easy to disseminate can come with massive risks. At the same time, research ethics regarding AI research are developing quickly but have not reached a phase of consolidation like it is the case in medical research or research areas labeled as DURC. But as the examples of misuse show and the extrapolations for possible misuse cases suggest, there is an urgent need for more discussion regarding dual use aspects in AI research and how to mitigate them. By highlighting the role that RECs can play in that process, this article provides a basis for further elaborations on how to mitigate dual use risks in AI research. The more attention that various stakeholders such as individual RECs (Bernstein et al., 2021), approaches like checklists from the scientific community (Kaffee et al., 2023; VLIR, 2022; ZEVEDI, 2023), publicly funded research projects (CHANGER, 2024; IRECS, 2024), and independent organizations (METR, 2024a) give to the topic of dual use in AI research, the more individual approaches to raising awareness and assessing specific research projects will be heard. We are in a negotiation process within a field of scientific research that is currently consolidating. Therefore, all efforts in this area are to be welcomed, even if it is not yet clear how dual use regulation of AI research will ultimately take a binding form.
Footnotes
Acknowledgements
The author is grateful to the two anonymous reviewers for their very detailed and invaluable comments on the first version of this article, which significantly contributed to its improvement.
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
All articles in Research Ethics are published as open access. There are no submission charges and no Article Processing Charges as these are fully funded by institutions through Knowledge Unlatched, resulting in no direct charge to authors. For more information about Knowledge Unlatched please see here:
.
Ethics approval
The author declares that research ethics approval was not required for this study.
