Sage Journals: Discover world-class research

Abstract

Due to the potential scope and impact of artificial intelligence’s (AI's) adoption in medicine, a comprehensive assessment of the potential ethical considerations arising during clinical integration is needed. Existing ethical frameworks and principles have aided in the identification of ethical considerations that may arise during the clinical integration of AI, however, our understanding of these considerations remains preliminary to the extent that it is not yet robustly informed by empirical research on key stakeholders’ experiences and perspectives. Utilizing a qualitative descriptive approach, we completed in-depth semi-structured interviews with physicians (n = 11) and AI researchers (n = 10) who had experience developing or using clinical AI regarding ethical considerations that they have perceived in relation to their work. An analysis of the interviews identified considerations related to information sharing and understanding, the risks and systemic impacts of clinical AI, and opportunities for safeguards. Physicians and AI researchers expressed questions relating to how much information and understanding is needed by both physicians and patients in order for AI to be ethically used in clinical practice, agreed that unintended impacts associated with clinical AI could pose threats to patient autonomy, and advocated for more diligent and thoughtful regulation of clinical AI innovation.

Keywords

Ethics artificial intelligence medicine qualitative research physicians developers

Introduction

Innovative technologies leveraging artificial intelligence (AI) offer immense promise for clinical care, from improving systems for prediction, diagnosis, and treatment to enhancing clinical workflows and decision support systems (Caballé-Cervigón et al., 2020; Rajpurkar et al., 2022; Stafford et al., 2020). Achieving robust ethics guidance and safeguards for implementing AI-based tools is widely recognized as critical in ensuring that these tools will be adopted safely and effectively in clinical settings (Morley et al., 2020). Early ethics work relating to AI in medicine has identified ethically relevant technical considerations, including issues related to generalizability and bias (Obermeyer et al., 2019), concerns regarding the privacy and security of systems that utilize and store health data (Iacobucci, 2017; Price and Cohen, 2019), and the importance of explainable AI (Amann et al., 2020); such findings have promoted frameworks for use in performing systematic ethical appraisals of the development process of clinical AI (Char et al., 2018, 2020). More recently, some ethical frameworks and principles have highlighted the merits of understanding clinical AI integration as a “sociotechnical” problem and, thus, of assessing novel tools within the context of their use and in relation to their possible impact on immediate stakeholders (McCradden et al., 2023; Sand et al., 2022).

The above frameworks have demonstrated value to identify and assess ethically consequential aspects of the integration of clinical AI. And yet more needs to be understood about these issues in order to operationalize AI in practice (Prem, 2023). Recent systematic reviews of ethics literature in clinical AI have identified a disconnection between high-level frameworks and principles (e.g. the Belmont principles: beneficence, non-maleficence, autonomy, and justice; National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, 1979) and the limited empirical ethics literature (Murphy et al., 2021; Tang et al., 2023). Indeed, more empirical research is essential to develop and operationalize high-level guidance (i.e. what “should” be the case) that is adequately attuned to real-world experience (i.e. what “is”) and can anticipate possible ethical tensions (Racine and Cascio, 2020). Due to the limited number of AI tools currently utilized in clinical practice (He et al., 2019; Shaw et al., 2019), such ethical frameworks have necessarily relied on assumptions about the risks and consequences of integrating AI in clinical care, and there has yet to be sufficient empirical or stakeholder-based research to test these assumptions. As noted by Murphy et al. in their 2021 scoping review of 103 ethics articles regarding AI in medicine, “Those leading the discussion on the ethics of AI in health seldom mentioned engagement with the end users and beneficiaries whose voices they were representing,” and “without better understanding the perspectives of end users, we risk confining the ethics discourse to the hypothetical, devoid of the realities of everyday life” (Murphy et al., 2021).

Recent qualitative studies analyzing clinician attitudes toward real and hypothetical implementations of clinical AI tools have begun to fill this gap in the literature and have been effective in providing an improved understanding of the ethical concerns raised in recent guidance, as well as in identifying novel concerns. Van Cauwenberge et al. (2022) and Tanaka et al. (2023) found that, in interviews relating to the incorporation of specific AI clinical decision support systems, physicians distinguished between parts of their job that they believed could and could not be effectively automated or replaced by AI, with these distinctions potentially based on aspects of their roles that they felt were crucial to their identities as physicians and human beings (Tanaka et al., 2023; Van Cauwenberge et al., 2022). In a broad interview study assessing GP's attitudes toward clinical AI, Buck et al. (2022) found concerns relating to the maintenance of the physician–patient relationship in the context of clinical AI, supporting Samhammer et al.'s 2022 findings in which physicians described their belief that future AI decision support systems will need to uphold the “classical strengths” of the medical profession, primarily the shared-decision making model (Buck et al., 2022; Samhammer et al., 2022). The Samhammer et al. study additionally found that physicians distinguish between transparency and explainability, with a preference for understanding the efficacy of a clinical AI tool over its inner workings, complementing similar findings by Dlugatch et al. (2024) (Dlugatch et al., 2024; Samhammer et al., 2022).

These findings demonstrate the effectiveness of a stakeholder approach in identifying and describing ethically salient issues in the integration of clinical AI, including considerations for the relationship between physicians and AI, the maintenance of the therapeutic relationship and shared decision-making, and the perceived value of explainable AI. However, these studies were not intended to directly assess ethical views, and often queried perspectives relating to specific real or hypothetical implementations of clinical AI. Furthermore, existing studies have primarily focused on physicians, and few have included other stakeholders, including AI researchers who design and develop such tools for use in medicine. Tech workers’ ethical views regarding AI have been qualitatively assessed more broadly (Browne et al., 2024), but Tang et al. (2023), in a systematic review of empirical studies in medical AI, found no studies focusing on AI researchers’ views (Tang et al., 2023). Prior studies by our team were among the first to examine AI researcher views regarding the algorithm development phase in medicine (Kasun et al., 2023; Kim et al., 2023), but not the integration phase. Notably, aside from our own work, no studies have directly queried both physicians and AI researchers about their ethical views, indicating that the full landscape of ethical considerations has yet to be empirically charted.

Due to the immense scope and impact of AI's adoption in medicine, a comprehensive assessment of the potential ethical issues arising during clinical integration is needed. While ethical principles and frameworks have provided a necessary first step in identifying such issues, exploration of stakeholder perspectives is critical to the ethical, and ultimately successful, adoption of AI in clinical practice. In light of the gap in knowledge stated above, we sought to understand the ethical implications associated with clinical AI integration as perceived by immediate stakeholders, namely, AI researchers and physicians. Utilizing an open-ended qualitative approach, we interviewed AI researchers and physicians who had experience developing or using clinical AI and who could therefore offer sufficiently informed views regarding its uptake.

Methods

Study design

The purpose of this interview study, which is part of a broader project (NCATS R01-TR-003505), was to describe the views of AI researchers and physicians regarding ethical considerations that they have encountered or anticipated in the development and application of AI in medicine (Kim, 2021). A qualitative descriptive approach was applied to the study design and data analysis, as this approach is predicated on the description and analysis of stakeholders’ experiences and perceptions using language and ideas that emerge directly from the stakeholders themselves (Neergaard et al., 2009; Sandelowski, 2000). Whereas other methods of qualitative research and analysis prioritize the development or advancement of theory, qualitative description is most conducive to the emergence of novel considerations that may not yet be fully understood and that are grounded in stakeholder experiences. Considering the limited amount of literature describing stakeholder perspectives relating to ethics in clinical AI, qualitative description was identified as the most appropriate method for analyzing the results of this study.

Participants and procedures

A purposeful sampling technique (Palinkas et al., 2015) was used to identify and recruit participants who were especially knowledgeable about the topic being studied, in this case, machine learning (ML) researchers who had experience developing AI or ML tools for use in medicine, and physicians who had experience developing or using such tools in their clinical practice. A total of 61 potential candidates (33 ML researchers and 28 physicians) from 10 U.S.-based R1 research universities were identified by consulting relevant literature and seeking recommendations from experts in the fields of AI and ML, medicine, and AI ethics (CARNEGIE CLASSIFICATION OF INSTITUTIONS OF HIGHER EDUCATION, n.d.).

Recruitment emails containing details about our project and an electronic interest form were sent to these 61 potential participants; 29 completed the interest form, of which 21 (10 AI researchers, 11 physicians) scheduled and completed an interview. Interviews continued until no new substantive themes emerged from additional data collection and content saturation was reached (Saunders et al., 2018). The final cohort included participants from 6 different U.S. R1 research universities. Participants in the AI researcher group, at minimum, held master's degrees and were working toward a Ph.D. in computer science or a related field and represented academic departments of biomedical informatics, engineering, and computer science; participants in the physician group held M.D.s and represented specialties of radiology, psychiatry, and surgery. The complete demographic information of participants can be found in the Supplemental Material.

Semi-structured interviews were used in this study to facilitate in-depth, open-ended discussion. The interview guide consisted of seven primary questions regarding the ethical considerations of AI in medicine; relevant follow-up questions were asked in response to participants’ replies to the primary questions. Interview questions were purposefully broad to allow participants the opportunity to address areas in clinical AI that they found the most ethically significant, as opposed to responding to topics or ideas defined a priori by our research team. This allowed participants the opportunity to identify ethical considerations which they may have experienced in their own work, but which have not yet been adequately described in the literature. Interviews were completed via Zoom between November 2020 and April 2021 and were conducted by one of our team's four trained interviewers. The interviews lasted 52 minutes, 6 seconds on average (ranging from 29 to 95 minutes) and were audio recorded for the purposes of transcribing.

Data coding

Interviews were transcribed and de-identified by members of our research team. Data analysis was completed using conventional qualitative content analysis, which involves breaking down transcribed data into descriptive units which are named and sorted based on their substantive content (Downe-Wamboldt, 1992; Hsieh and Shannon, 2005). This method allows for the development of codes and themes that are inductively derived from the data, versus applied based on a priori assumptions about the content of the data. As is common in qualitative research, we aimed to achieve consistency in our qualitative coding and analysis (Noble and Smith, 2015). The coding (i.e. the labeling of the data) and analysis (i.e. the sorting and describing of the data) processes each consisted of a series of steps which are depicted in the Supplemental Material.

An initial round of open coding was performed on each of the transcripts by two randomly assigned research team members, who independently identified substantive content from the interview text and suggested potential code labels and definitions descriptive of the content. Upon completion of a round of open coding, the full research team held five meetings to discuss the reviewed transcripts, as well as the rationale for suggested codes and definitions. In cases where individuals who had reviewed the same transcript had code labels or definitions that diverged, they collaboratively refined the code labels and definitions with input from the full team until agreement was reached. At the end of this series of meetings, the first draft of the codebook was established.

A round of intermediate coding was then performed in which a subset of transcripts was again reviewed by two randomly assigned research team members, this time using the draft of the codebook as a guide. Each team member labeled the content within their assigned transcripts using the code labels and definitions in the codebook, specifically highlighting content that they felt was not adequately described by any existing codes. Upon the completion of intermediate coding, the full research team met twice more to compare the coded units, refine the code names and definitions, and incorporate new codes and definitions as needed based on the results of intermediate coding. In cases where individuals who had reviewed the same transcript had coded the content differently, they described the rationale for their decision to the full team, who worked together to refine the code labels and definitions as needed to ensure consistent future coding. The result of these collaborative meetings was the final version of the codebook, which contained 30 codes and definitions describing categories that emerged from the content of the interviews, as well as example quotes for each code.

Interview transcripts and the codebook were uploaded to NVivo 20 for final coding, which was completed by a single author (KR) with a background in qualitative coding and analysis in order to ensure coding consistency across all transcripts. The final coded transcripts were reviewed by the principal investigator (JPK) to ensure that codes were applied thoroughly and consistently.

Data analysis

After final coding, the full research team performed qualitative content analysis on the data set (Downe-Wamboldt, 1992; Hsieh and Shannon, 2005), as depicted in the Supplemental Material. Upon an initial review of the 30 codes and their content, it became clear that participants had addressed topics relating to three distinct phases of AI development for medicine: the problem formulation phase (Phase 1), the algorithm development phase (Phase 2), and the clinical integration phase (Phase 3) (see Figure 1). In order to reflect the depth at which participants addressed these topics, the results presented in this manuscript describe findings from the analysis of codes related to Phase 3. The analysis of findings related to Phases 1 and 2 has been published in earlier reports (Kasun et al., 2023; Kim et al., 2023).

Figure 1.

Inductive categories and codes derived from interviews with physicians and AI researchers. Adapted from: Kim JP, Ryan K, Kasun M, et al. (2023) Physicians’ and machine learning researchers’ perspectives on ethical issues in the early development of clinical machine learning tools: Qualitative interview study. JMIR AI 2(1): e47449. 1. These codes contained content that was related to multiple phase categories and are therefore listed more than once in this figure.

After the coded content had been sorted into phases, the full research team continued analysis in order to identify major categories describing the overarching themes that emerged within each phase. After major categories had been identified, the team continued to analyze and sort the data at a more granular level, identifying subthemes which described specific ideas and topics which pertained to the larger categories. The team endeavored to develop subthemes that were descriptive and inclusive of all related content, whether that content represented agreement amongst those who spoke about the topic or offered varied or diverging views. If a diverging view was expressed it was explicitly noted.

Ethics review

This study obtained human subjects research approval from the Institutional Review Board of Stanford University on November 16, 2020 (#58118). A PDF copy of the IRB-approved informed consent form was emailed to all potential participants; Participants provided verbal consent prior to the start of the interview. Participants were compensated in the form of a $150 electronic gift card.

Results

All 21 participants in this project raised considerations related to the integration of AI in medicine (Phase 3, as depicted in Figure 1). From the 15 inductive codes associated with Phase 3, 3 major categories and 9 subthemes emerged (see Figure 2). These categories and the related subthemes are described in detail in this section.

Figure 2.

Major categories, affiliated codes, and related subthemes describing ethical considerations in the integration of clinical AI, as described by AI researchers and physicians.

Information sharing and understanding

Physician understanding of clinical AI

Physician interviewees in this study grappled with the question of what level of understanding they should have about an algorithm and its limitations in order to act as “conscientious users” [Table 1, Section 1: Quote 15.2]. They felt that their ability to assess and understand AI tools could influence their confidence in the device and willingness to incorporate it into clinical practice. As stated by one physician: “You’re asking clinicians to make decisions based on some kind of output that they may not necessarily fully understand, and that's confusing. I think that's not something that clinicians will be comfortable doing” [1, 1: 15.1]. Physicians and researchers commented that healthcare providers will need to receive training in how to interpret and incorporate the results of clinical AI, and how to identify its limitations [1, 1: 06, 20.1], with one participant summarizing that providers will need to learn “how to interpret algorithms, how to understand what AI does and does not do, and how to understand the inherent biases that are in AI so that they understand [not only] how the tool can work, but also the limits of it” [1, 1: 20.1]. Physicians and researchers agreed that physicians do not currently receive the training necessary to understand clinical AI tools sufficiently and make the requite judgments regarding how to apply them in patient care [1, 1: 10, 20.2, 22, 33].

Table 1.

Selected quotes from AI researchers and physicians regarding information sharing and understanding.

1: Physician understanding of clinical AI	[06—AI researcher] “[Clinical AI] is going to require trained doctors who are knowledgeable about how to ignore the AI when they see something that is not super useful, and this might require designing AI's that are more interpretable so that a doctor can investigate something if that's true.” [10—AI researcher] “These physicians went to medical school, they went to residency, they’re trained at reading x-rays, but they’re maybe not trained at… how to decipher machine learning prediction probabilities or any of these things. So there is a question about, how do we integrate this algorithm?… That's something that we don't really have a good answer for right now. [15.1—Physician] “The physicians who are receiving these predictions, do they really understand how these predictions came about? What was the data set? What was the source of data, why did this [prediction] even happen? I think that could pose some ethical issues because you're asking clinicians to make decisions based on some kind of output that they may not necessarily fully understand, and that's confusing. I think that's not something that clinicians will be comfortable doing.” [15.2—Physician] “How much does [a physician] really need to understand in order to even just be a conscientious user of AI? … How do we communicate the model and the model accuracy, for example, the model limitations at the point of care? … How do you communicate a probability? How do you communicate the uncertainty? And basing [this information] on the training that physicians already have in basic statistics and epidemiology, and that's kind of a process.” [20.1—Physician] “I think the key to this is teaching providers how to interpret algorithms, how to understand what AI does and does not do, and then how to understand the inherent biases that are in AI so that they understand how the tool can work, but also the limits of it. We didn't have MRI's for a long time and all of a sudden we did, and had to train generations of providers how to read an MRI. And there are things that an MRI tells you and there's things that it doesn’t. There's things that ultrasound still does better than an MRI. That's where the physicians need to know the limits of the tool that they’re using and when to bring in other tools to help.” [20.2—Physician] “My biggest concern is that we aren’t doing that education piece with providers or with the staff who use these tools on the front line… That to me is troubling because that can interfere with the ability to really understand these tools. The people who are using these tools need to know what they’re being used for and how they’re being used.” [22—Physician] “I do think that for physicians, it's very hard for them to first of all know what kind of algorithms to use, because we’re not trained to think about those kind of things, and then to understand how to interpret the output of these algorithms, and then to explain to the patient what the output means… We need to help the clinicians in those steps but I feel like that's not really being done right now.” [33—AI researcher] “[People always say] ‘the doctors are the ultimate experts and there's going to be algorithmic recourse. They’re going to be able to veto decisions that a machine makes,’ and so forth. [However] I don’t think [the outputs are] going to be that clear in all these more complicated biological measurements. I think people are just going to defer to the algorithm. Are doctors going to understand that these are the source of fundamental biases of the algorithms and the kinds of data sets they’ve been trained on? This is where [developers and biostatisticians] actually have to actually step in and I worry that that's actually not going to happen.”
2: Determining the appropriate balance of information to communicate to patients	[12—Physician] “If you start getting into the details of stuff, I think [the informed consent process in AI] really confuses patients too. The context is, Google takes all your information, doesn’t tell you what the information is, and will use it for whatever it wants, even things that are going to hurt you. But then the physician says, “Okay, I need you to know that we’re going to be collecting this data and we’re doing this, and it might hurt you this way.” I do think some of that's necessary, but when you’re starting to get into these complex spaces, how do you really do informed consent? What threshold of rarity or unlikelihood do you actually have to talk to someone about? It starts to become very bizarre, like how FDA labels are. They will list every single side effect that a drug has, even if it was experienced by just one patient in a 1000 person trial; It's exhausted to the point of being uninformative.” [14—Physician] “We need better consent mechanisms, but it turns out in practice that's extremely difficult. You’re showing up for your colonoscopy and someone shoves a sheaf of papers in front of you and says ‘sign all of this.’ If you had questions about how exactly would my data be used, who's going to answer questions about that? The person who's across the table, they don’t have the knowledge to do that. I agree with the notion that we need to do better at making patients aware and giving them the opportunity to opt out, but practically, that's a really hard problem. [15—Physician] “It's impossible to tell patients everything about what goes into a decision. In fact, that never happens. We don’t really discuss how a laboratory assay works to give us a number. We just trust the assay that it works or that it's somewhat accurate …There's always a limit to how much we explain just based on practical limitations. So where does AI fall into that? It's hard to say. And also how much do patients really want to know? They’re here to get taken care of by someone they trust, by a competent clinician or team of clinicians. Is it actually beneficial—and this goes into the kind of the ethical issues of beneficence versus autonomy—but is it beneficial to the patient's care to provide potentially extraneous information that might actually not help care and may even hinder care?” [25.1—Physician] “AI is a challenge to [patients’] understanding. These things are fairly complicated for someone with a doctorate to understand—what about somebody that has a sixth grade level of education? These are really abstract concepts … I think that's one of the barriers to doing [clinical AI research] with these folks: The concern that ‘they don’t understand, therefore, you shouldn’t do [the research].’ … But I actually think about the other perspective: what are the ethics of not working with the most vulnerable populations?” [25.2—Physician] “I think this is partially a Western culture thing, where we think that people have to understand everything that's going on all the time … It's cognitively impossible to understand the processes of everything that's going on. Obviously that doesn’t mean you want to be deceptive, but I think finding the balance is the challenge.”
3: The role of explainability in clinical decision-making	[20.1—Physician] “I want to know what [an AI tool] aims to do, I want to know the data on which it's drawing to make its interpretations. I don’t need to know the specific data points, but I want to know what's its approach to data and how it's segmenting that data. I want to know to what extent the algorithm could be biased by the population that it trained on. Those questions around bias and equity are important to me to understand. And then I want to be able to understand the result that it's giving and how I should factor that into my decision making.” [20.2—Physician] “How far do you want to go with being transparent [with clinicians], and does transparency have a price?… The price [of high transparency] is that we stigmatize the patient—that the [AI output] that is presented becomes, in the provider's mind, the reality, and then they take actions based on that. Whereas if the computer just says to the provider, ‘This person is potentially at risk,’ it requires the additional step of the provider going in there and trying to verify elements [of the AI recommendation] or doing a new assessment … That gives you a chance to correct anything that could be wrong.” [22—Physician] “As a clinician you’re supposed to inform patients why you’re doing something. You’re trying to do shared decision making. Patients are supposed to know why you’re making certain decisions. If you’re telling them, ‘I’m making a decision because that's what the algorithm said,’ I think it's questionable whether the patient has enough information in that scenario, because you can’t explain why the algorithm made that decision.” [25—Physician] “With some of the more complicated algorithms, it's hard to know how decisions are being made. There have been some writings, for example, of some of the biases that algorithms have that just integrate the biases that humans already have … I think the more complicated [AI algorithms] get, the harder it is to really understand how and why decisions are being made.”

Determining the appropriate balance of information to communicate to patients

Physicians in this study elevated the importance of information sharing and the shared decision-making model. However, they struggled with the questions around the acceptable level and type(s) of information to share with patients regarding the AI used in their care. Several physicians referenced certain “practical” barriers that may limit providers’ ability to fully inform patients about the use of or risks associated with such tools, one of these being that the complexity of the underlying algorithms may make them challenging to explain during the informed consent process [Table 1, Section 2: Quotes 12, 14, 25.1]. These physicians agreed that, when it comes to clinical AI, it is especially easy to provide detail that is “exhausted to the point of being uninformative” [1, 2: 12], and questioned whether it is “beneficial to the patient's care to provide potentially extraneous information that might actually not help care and may even hinder care?” [1, 2: 15]. When discussing the appropriate level of information to share with patients regarding clinical AI, several physicians referenced how many processes in medicine, including those involving other types of medical devices or technology, are often not fully explained to patients [1, 2: 15, 25.2], but were hesitant to conclude whether or not clinical AI devices should be regarded similarly.

The role of explainability in clinical decision-making

Whereas AI researchers in this study agreed on the importance of prioritizing explainability while developing clinical AI (Kasun et al., 2023), physicians expressed more varied opinions regarding the role of explainability in AI's clinical implementation and use at the point of care. One physician described how explainability contributes to physicians’ ability to provide quality patient care and support robust shared decision-making, noting, “Patients are supposed to know why you’re making certain decisions. If you’re telling them, ‘I’m making a decision because that's what the algorithm said,’ I think it's questionable whether the patient has enough information in that scenario, because you can't explain why the algorithm made that decision” [Table 1, Section 3: Quote 22]. Others referenced how low-explainability algorithms could contribute to bias [1, 3: 20.1, 25]. One of these physicians, however, clarified that there are limits to the amount of explainability that they would find helpful, noting “I don’t need to know the specific data points… I want to be able to understand the result that it's giving and how I should factor that into my decision-making” [1, 3: 20.1]. One physician also questioned the assumption that increased explainability is always beneficial, asking “does transparency have a price?” in recognition of the possibility that providing physicians with more information about an AI tool's decision could have unexpected, and not necessarily positive, impacts on how they understand, view, or treat their patients [1, 3: 20.2].

Risks and unintended impacts of clinical AI

Challenges of predicting the impacts of clinical AI

Physicians and researchers in this study expressed concerns that the combined novelty, speed, and scale of AI innovation could entail new risks for patients, physicians, or healthcare systems that stakeholders may not necessarily be able to anticipate. One researcher described the evolution of AI and its accompanying risks as a “branching tree,” with each new innovation resulting in a host of possible ethical dilemmas and risks [Table 2, Section 1: Quote 13]. A physician highlighted the effect that the scale of AI integration could have on consequent risks, arguing, “The degree of ripple effects is directly correlated with the scale at which your solution is deployed… And with AI it's so easy, it's so cheap to have one product actually be used by a lot of people” [2, 1: 15]. Other participants expressed concerns about the possibility of clinical AI tools negatively impacting health outcomes, even when the tools perform as expected. A researcher argued that it is difficult to predict these secondary risks of AI tools, noting, “We can't just assume that giving people that score is going to lead to better outcomes. For particularly high risk [patients], it may lead to physicians intervening in more aggressive ways than they otherwise would and causing harm, or any number of different things” [2, 1: 18].

Table 2.

Selected quotes from AI researchers and physicians regarding risks and unintended impacts of clinical AI.

1: Challenges of predicting the impacts of clinical AI	[13—AI researcher] “Where I feel concerned when I think about all these issues is the fact that, even just from my initial thinking about it or me being certainly not trained in ethics or anything like that, but immediately there's already many kinds of questions that you think, “Oh, well this could happen, or that could happen.” And the more you think through, it's almost like this branching tree that there's more and more, and I probably haven’t even thought all the way through…I can imagine that the more you think about it, there's many of these questions that are a little bit subtle and not easy to answer.” [15—Physician] “I think that in general, the world operates in ripple effects. You just can’t really predict what a fifth or sixth order effect will be, especially in healthcare. You really have to think about this with AI … I think it's related to the number of people who are affected, and with AI it's so easy, it's so cheap to have one product actually be used by a lot of people. It's very easy for AI to process a lot of data, so that also has a multiplier effect in terms of the number of people who can be affected.” [18—AI researcher] “Even if an algorithm and model gets really good accuracy results, we can’t just assume that giving people that score is going to lead to better outcomes. For particularly high risk [patients], it may lead to physicians intervening in more aggressive ways than they otherwise would and causing harm, or any number of different things like that.”
2: Impacts on patient autonomy and the therapeutic relationship	[06.1—AI researcher] “Even though we know humans aren’t perfect and they’re also biased and this and that, we know that humans and AI differ very significantly in that you could ask a human a counterfactual question and be like, ‘Well, would you say the same thing to me if I was a different gender?’ And the person could think about that and look at it. Whereas in AI, it is just going to take the input data and then is just going to spit out an answer, and it's going to spit out the same answer every time you run it. There's no reasoning, no probing or asking it to think about it more deeply or anything like that, whereas humans can do that. And you can also just choose a different person to talk to, whereas, if there's only one AI, then what do you do?” [06.2—AI researcher] “How much agency are we leaving these people? How much agency do the people have in not having their data included in stuff, or not using the system at all? I’m fine with AI being involved in healthcare as long as it's decently good, but I think the big requirement is, can someone opt out and is there a pathway for someone to opt out of using any AI and only going through humans?” [10—AI researcher] “Sometimes [at-risk patients identified by AI] don’t want to be detected. They don’t want this intervention. This is another example of the clinic, the healthcare system taking away their rights. They don’t want a pamphlet, they don’t want a social worker to talk to them, and they certainly don’t want to be lectured.” [15.1—Physician] “With AI, it's not an entity that you can really negotiate with, at least not right now… They’re just cold, inflexible algorithms that give you some kind of output with a probability and an error, that's it. Whereas a human being is someone you could actually negotiate with. Like, let's say they tell you something and you don’t agree, you think they’re wrong. You could tell them that and you have a conversation. I think there's a lot of complexity that we’re not even close to being able to replicate [with AI], and that's such an important part of any type of high stakes decision, definitely in healthcare, so I would hope that that is maintained. The hope is that you can free up physicians to do more of that so they don’t have to deal with the lower level stuff, but there's always a risk of that actually going away if we start to replace more of this complex multi-party decision making with just an AI system telling you what to do.” [15.2—Physician] “How will [AI] change the culture of medicine? How much are we going to over-rely on algorithmic thinking versus just having a person figure it out? There is this important part of a therapeutic relationship where you’re just with someone. You’re just with a person in front of you and you’re taking in data as the person is getting sick by examining them and you’re going over things with them. And you have the expertise to understand how these conclusions are made, and then you can explain it to that individual as a human being. I think that's very therapeutic. I think that patients really appreciate that and it just makes a difference. And I can imagine how if we become over reliant on technology and algorithms…that could be more destructive to the therapeutic relationships…I think the maintenance of that human-to-human interaction—which it's not just an interaction, it's two people just figuring things out together—It's always more rewarding than having something just be told to you and then not negotiable.” [20—Physician] “To what extent should patients be able to opt out of their provider having these digital tools running in the background or not? To what extent can providers opt out, like not wanting to look at it?” [26—AI researcher] “There's a huge amount of patient [self-]advocacy that's required in order [for some patients] to get the care [they] need. What's that going to look like in a situation where algorithms and ML make recommendations that are potentially wrong for you? Where the patient has access to information that says, ‘Actually in this scenario, you should defer to my needs more than what this recommendation says.’…. When all the doctors’ decision-making is being run by algorithms, who has the expertise and authority to question some of those decisions and articulate why they might be wrong in a particular setting? Who can look at the systematic reviews of whatever studies these are based on say, ‘Actually, these are the blind spots of this body of work. That doesn’t apply to this sort of patient.’?” [27—Physician] “There's this humanistic element in how we deliver information and how we decide to give the patient options. I wonder if AI would still allow those same options, if they think that this one method is going to give them the best outcomes but not consider quality of life…We ultimately allow the patient autonomy and I wonder if the AI would take away some of the patient autonomy, because they’re like ‘this is the only right way,’ versus ‘these are the options which may not have the best outcomes, but still options that you can take a gander with.’”
3: Impacts relating to fairness and equity	[17—Physician] “Suppose we have two treatments. One, this app that's quite good but is not as effective in control trials as the treatment delivered by a therapist. Is it reasonable, ethical even, to promulgate this weaker treatment over a very large population that is not getting any treatment, rather than wanting to treat them with a very good therapy that has much better results? From an epidemiological perspective, I think the answer is: it probably is more beneficial to give a poorer therapy to a lot of people than a very good therapy to a few people, given the fact that we can probably never ever ever get our workforce to ever give very good therapy to all the people.” [23—Physician] “One of the problems, of course, is societal access. I think the wealthier societies can benefit from this earlier. The flip side of the argument of AI is being a democratizing force, enabling broader access.” [25.1—Physician] “The whole population doesn’t have the same level of digital literacy, so the amount of data that they’re putting into the universe of big data is not proportional to the representation of the population. Then decisions that are being made by this data are going to focus more on one segment of the population.” [25.2—Physician] “I think one of the downsides is—and some people have voiced concerns—with the better we get at automation and digital tools, at what point do we create a different class? Do some people only get the digital interventions and then the people that can afford it get the premium face to face?” [27—Physician] “A lot of people who need to seek treatment or delayed seeking treatment are excluded from these data sets, and I think that becomes really problematic for all research itself, especially when you’re using and training an algorithm to give you recommendations. If you don’t have that inclusivity there and you’re not capturing those patients who don’t seek out care, then it becomes less relevant for what we use it for.”
4: Heterogeneity of risk of clinical	[13—AI researcher] “Things that are directly making the decision have more potential to create direct harm versus something that's either first reviewed by another human expert or providing some sort of aggregate level information across a population but is not directly used in the care of any one individual.” [14—Physician] “Each clinical problem has its own risk-benefit profile. You want to do some analysis of: how many patients do we think this is going to apply to? What are the potential benefits to those patients? What are the potential risks to them and to other patients? And do a cost/benefit analysis for each case.” [20—Physician] “At what point in the [clinical] process is [AI] being used? Screening versus diagnostics versus interventions… I think the risks around screening are much less, especially if you’re going to do a verification afterwards. The risks around intervention are the highest, that's the most potentially risky.” [25—Physician] “You have to think about, what are the consequences of getting the prediction wrong? The consequences of me getting the prediction of whether or not you’re going to walk the steps is much lower stakes than predicting whether or not you’re going to attempt suicide. The level of risk of getting it right or wrong is going to determine the amount of accuracy that then you require of the algorithm.” [33—AI researcher] “I think that the ethical considerations are very context specific… I would argue that the ethical considerations would vary based on the context, and what the incentives are, and what the quality of the data and who collects the data, and the environment in which the algorithm or the data would be used.”

Impacts on patient autonomy and the therapeutic relationship

Physicians and AI researchers described aspects of patient autonomy and the shared decision making process that they felt could be impacted by the widespread use of clinical AI tools, such as the possibility of shifting attention away from interpersonal aspects of care, and limiting the set of alternatives in the decision making process. [Table 2, Section 2: Quotes 06.1, 15.1, 15.2, 26, 27] One physician summarized: “[Physicians] ultimately allow the patient autonomy, and I wonder if the AI would take away some of the patient's autonomy, because they're like, ‘this is the only right way,’ versus ‘these are the options which may not have the best outcomes, but are still options.'” [2, 2: 27] One researcher expressed a related concern that over-integration of an AI system could limit patients’ ability to seek a second opinion that is truly independent: “[Right now] you can just choose a different [physician] to talk to, whereas if there's only one AI, then what do you do?” [2, 2: 06.1] Underlying many of these comments was a concern about the potential impact that AI could have on the therapeutic relationship, which physicians did not feel could be substituted with technology: “There is this important part of a therapeutic relationship where you're just with someone… I think that patients really appreciate that and it just makes a difference” [2, 2: 15.2].

Other physicians and researchers discussed patient autonomy in the context of patients’ ability to decide when and how much AI should be incorporated in their medical care. A researcher asked, “How much agency do the people have in not having their data included… or not using the system at all?,” which resonated with a physician's question, “To what extent should patients be able to opt out of their provider having these digital tools running in the background?” [2, 2: 06.2, 20]. A different researcher anticipated that a possible ethical tension between beneficence and voluntarism could emerge in cases where an AI tool provides a clear and objective health benefit (i.e. more accurate health risk detection) that nonetheless stems from AI monitoring to which patients may not have consented, noting that “Sometimes [at-risk patients identified by AI] don't want to be detected” [2, 2: 10].

Impacts relating to fairness and equity

There was agreement among the physicians that clinical AI has the potential to benefit different patient populations disproportionately, building on points made by AI researchers when discussing the development phase (Kasun et al., 2023). Multiple physicians discussed the limited generalizability of datasets used in training AI systems, due to both differential access to health care, and use of technologies that collect such information. [Table 2, Section 3: Quotes 25.1, 27] One physician expressed worry about a future in which AI could potentially “create a different class,” where “some people only get the digital interventions and then the people that can afford it get the premium face to face” [2, 3: 25.2]. Despite these concerns, several physicians agreed that medical AI could potentially “enable broader access” to needed healthcare [2, 3: 23], and that even if AI provides a lower quality of care, “it probably is more beneficial to give a poorer [AI] therapy to a lot of people than a very good [in-person] therapy to a few people” [2, 3: 17].

Heterogeneity of risk of clinical AI

In general, physicians and researchers agreed that the risks and benefits of any AI tool are highly heterogeneous and dependent on the context in which it is used, with one researcher noting, “The ethical considerations are very context specific” [Table 2, Section 4: Quote 33], and a physician stating, “Each clinical problem has its own risk/benefit profile” [2, 4: 14]. Participants said that multiple aspects of clinical integration including its domain of use (e.g. prediction, diagnosis, or intervention), the risks accompanying the health concern, and the level of expert oversight are important considerations when assessing the risk/benefit profile of a medical AI tool. [2, 4: 13, 20, 25]

Safeguards for clinical AI

The relationship between physicians and AI

When discussing both the current state of and the future of AI in medicine, physicians and researchers noted how human moderation of AI tools is necessary for the acceptance and advancement of such tools. Physicians and AI researchers agreed that fully autonomous AI in medicine is not likely, with one researcher emphasizing the necessity of “leaving the doctor to make decisions is crucial and not having any part of the pipeline without a human involved” [Table 3, Section 1: Quote 06]. Applications that attempt to complete tasks end-to-end without clinician input were not expected to be successful, [3, 1: 05] in part because it was expected that “people will not relinquish that level of control and oversight to something that can kill people” [3, 1: 23.2].

Table 3.

Selected quotes from AI researchers and physicians regarding safeguards for clinical AI.

1: The relationship between physicians and AI

[05—AI researcher] “Applications where a machine learning algorithm is trying to do something end-to-end and replace the doctor, those tend to be less successful than ones that tend to be assisting a clinician.”
[06—AI researcher] “No AI should be doing a prediction alone. I’m a fan of assistive AI in the sense that it doesn’t automate or replace any humans, but it just assists them in a way… I think leaving the doctor to make decisions is crucial and not having any part of the pipeline without a human involved.”
[11—AI researcher] “Whenever I think of machine learning for health, I think of it more as a tool that's supposed to assist in clinicians’ everyday work, and just to make their bureaucratic work a lot simpler so that they’re able to fully focus on patient care. When I think of machine learning models, I think of them as working hand-in-hand with the human; this human/AI collaboration. I don’t really see AI as replacing humans or replacing clinicians, I see it more as just trying to change the everyday workflow of the clinician so they can focus on what's most important.”
[14.1—Physician] “If you go back into history, always the human plus the AI is better than either one alone.”
[14.2—Physician] “When MRI first came out, people said “we’re just going to sell it directly to primary care doctors because the abnormalities are so clear… take the radiologist out of the loop.” Well it turns out that you need a lot of expertise to configure the device to optimally answer the question. You need to understand something about the physics of how MR works… So we as radiologists have learned about the physics of MR… We know enough about how it works so that we can intelligently use it in service of patients, and radiologists have been doing that ever since we partnered with the engineers who ran the generators to generate the electric current we needed to make x-rays, all the way through to ultrasound, CT, MRI. We’ve been bringing high technology in service of patients for over a century, and [AI] is just going to be another one of those.”
[20.1—Physician] “[Physicians] are not infallible either. We tend to think that humans can do this better, but they can’t necessarily. Humans are prone to bias. You probably need both [humans and AI]. I think the digital tools are perhaps the most consistent scientific tools because they’re just going to do what you program them to do, and the humans hopefully bring the art to it to understand the limits of the science and how to contextualize that, so that between the two you get a more rounded picture.”
[23.2—Physician] “People will not relinquish that level of control and oversight to something that can kill people. I think we’re very excited about the future, but also using it with caution.”

2: Increasing regulation while encouraging innovation

[06—AI researcher] “My answer is always going to boil down to regulation… I question the motivations of companies in a capitalist society…. I think regulation is the only way and unfortunately it seems like the US is very slow at adopting regulation. I think that the problem is there's a lack of knowledge around it, which is why regulation is really slow. People don’t understand how this stuff is happening.”
[10—AI researcher] “There's a notion that if you have one device and it's close enough to another device that you don’t have to resubmit it [to the FDA]. But what does it mean for one algorithm to be close to another algorithm? Does it mean it's just retrained on new data? Does it mean a hyper-parameter shift? Does it mean that it was logistic regression and then another logistic regression? There's no clear concept of what's going on…It's not a sustainable system. Any top down protocol of what [regulation] should look like will be utterly necessary to build widespread trust.”
[12.1—Physician] “I don't think there's any standards right now, but ethically, if you’re going to be putting stuff in a patient, there needs to be—while I’m not sure about having the FDA model– some government regulators. Actually I think it's a pretty horrible plan, having some government regulators be the gate that lets you put it in. It's really expensive and will completely cut out most academics from doing it, and it will slow things down in a way that doesn’t comport with how AI works. There are so many reasons not to do it that way, but by the same token, how is it that we can do real peer review in a legitimate way that doesn’t require an immense amount of effort to convince a company to do the right thing?”
[12.2—Physician] “I worry about a backlash of regulation against [clinical AI]. If everything required FDA approval to get to patients, I think that would be bad for patients ultimately too.”
[14—Physician] “It's an obligation we have to our patients that we want to make sure that the technologies that we use when we diagnose and treat them are shown to be safe and efficacious. There's a lot of work out there that's not necessarily of high quality and that needs to have some kind of oversight, and I think the FDA is the right place to do that.”
[17—Physician] “Medical devices should be regulated by the FDA. I would certainly want to know the operating characteristics and what testing has been done and published, which of course industry doesn’t like doing too much.”
[23.1—Physician] “There are several FDA-approved AI algorithms and we’re much more comfortable using them because they carry the FDA seal of approval. I would not put anything on a patient's report… that would have potential risk if it didn’t have some type of regulatory approval.”
[23.2—Physician] “Over regulation of the field might slow down the development of AI. Concerns over HIPAA information and then compliance with regulations, that really puts a damper on research moving forward and consequently on development.”

Participants in both groups stated that they expect that the most successful applications of AI in medicine to be ones that “change the everyday workflow of the clinician so they can focus on what's most important” [3, 1: 11]. Physicians expressed excitement about the possibilities of AI in medicine, with several referencing how physicians’ current strengths can be supplemented by AI and suggesting that “the human plus the AI is better than either one alone” [3, 1: 14.1, 20]. Several physicians viewed medical AI as the next step of technological advancement in medicine, which has historically incorporated new technologies in order to improve care delivery systems and outcomes, and which will continue to require the expertise of trained physicians to correctly operate and interpret. For instance, one radiologist observed that medicine has “been bringing high technology in service of patients for over a century, and [AI] is just going to be another one of those” [3, 1: 14.2].

Increasing regulation while encouraging innovation

Participants in both groups agreed that AI is currently under-regulated in medicine, but there was not consensus regarding how to enhance regulation while continuing to encourage innovation. Researchers who commented on this topic endorsed more robust regulation in order to improve trust in medical AI, commenting that a “top-down protocol…will be utterly necessary to build widespread trust” [Table 3, Section 2: Quote 10], and that “regulation is the only way” to build this trust [3, 2: 06]. Physicians considered medical AI tools to be medical devices requiring regulation, certification, or oversight [3, 2: 12.1, 14, 17], with one noting that their comfort level regarding the use of AI in patient care is dependent on regulatory approval: “I would not put anything on a patient's report… that would have potential risk if it didn't have some type of regulatory approval” [3, 2: 23.1]. While encouraging more formalized regulation of medical AI, physicians simultaneously worried whether “over regulation of the field might slow down the development of AI,” [3, 2: 23.2] which they agreed “would be bad for patients, ultimately” [3, 2: 12.2].

Discussion

In this open-ended qualitative study analyzing the ethical perspectives of physicians and AI researchers with experience developing or using AI in medicine, participants identified a series of considerations relating to clinical integration of AI. Three main findings, which are described in detail below, were identified from this analysis: (1) There were many open questions about how clinical AI may impact physician and patient decision-making, with specific questions related to how much either party needs to be informed of and understand to support its ethical use; (2) A collaborative physician-AI relationship was seen as the most acceptable path forward for the purposes of mitigating potential risks, anticipating possible systemic or otherwise unintended impacts, and promoting patient autonomy; (3) Increased regulation of clinical AI appeared likely to improve physician trust in the tools, but was thought to entail unique accompanying challenges, including potential impacts on innovation.

The tension between information and understanding

Physicians in this study cited how explainability could increase physicians’ comfort and confidence in using an AI tool, in part due to their desire to be able to fully explain and justify their decision-making process to patients. In their discussions, however, physicians valued the desire for access to information regarding a clinical AI tool's risks, benefits, and limitations over being able to pinpoint specific data points relating to AI decisions. Although explainable AI has been widely identified in the ethics literature as necessary to its adoption in medicine (Amann et al., 2020; Kiener, 2021; Kundu, 2021; Liu et al., 2022), numerous recent stakeholder studies have found physicians’ preference for contextual, model-agnostic information (Diprose et al., 2020; Dlugatch et al., 2024; Samhammer et al., 2022). These findings demonstrate that there are likely limits to the amount of explainability that physicians view as beneficial, and that they may prefer access to information that is directly relevant to their ability to provide patient care, such as the risks and benefits of any given tool, its performance in different subpopulations, and guidance regarding how to best incorporate the results into clinical practice.

Several empirical studies have found that, while explainable AI contributes to increased physician trust in the systems, clinical benefits of such transparency have yet to be demonstrated (Clement et al., 2021; Diprose et al., 2020; Markus et al., 2021). Participants in our study provided potential context for these prior findings in their discussions about physician understanding of AI. They acknowledged that increased explainability or additional information about clinical AI will be advantageous to the extent that physicians have the ability to effectively assess the merits and limitations of an AI tool and accurately interpret and apply the results. Moreover, they noted that physician training in how to analyze and interpret AI, or perhaps even the involvement of technical experts, will likely be needed. Samhammer et al. came to a similar conclusion in their 2022 study, in which they found that AI explanations should be interpreted in context and they rely on physicians to apply understanding of the system relative to their clinical goals (Samhammer et al., 2022). In conjunction with efforts to develop more explainable AI should be efforts to train physicians in how to assess and interpret such programs, as explainability or transparency without understanding is unlikely to be of benefit to physicians or patients. This is especially pertinent considering the limited amount of research that has been done to determine how to best incorporate AI in the medical curriculum (Grunhut et al., 2021).

Discussion regarding the tension between information sharing and understanding extended to patient care as well. Physicians were not clear as to how to best inform patients about the use of AI in their care, and whether or not that information was beneficial or even desired by patients. Recent studies have found that revealing information about an AI model's performance increases patients’ reported trust and perceived usefulness of the model, and that informing patients about the use of AI in their care may improve their satisfaction (Sun et al., 2023; Zhang et al., 2021). Physicians in our study, however, acknowledged issues related to patient understanding and healthcare logistics that they felt could limit their ability to effectively share this information with patients. They described how a practical limit to the level of information physicians share with patients exists for other technologies used in patient care. But were less sure whether AI was merely another case of using this professional discretion. Reconciling patient desire for information about clinical AI and physician ability to provide it in an effective manner that does not upend interrelated clinical processes is a crucial step toward ensuring the future ethical use of these tools.

Mitigating risks and unintended impacts through physician-AI collaboration

When discussing potential risks associated with integrating AI in medicine, participants in this study expressed notable concern regarding the possibility of impacts on healthcare systems, physicians, and patients that they considered to be unintended. These concerns seemed to be heightened because of the potential for wide deployment of AI tools; the many possibilities for “branching” or “rippling” effects; and the heterogeneity of medical AI applications, which entail risks that are specific to the tool, the characteristics of the target population, and the timing or type of the intervention. Participants’ wariness of possible unintended or unidentified risks that may develop as a systemic consequence of widespread adoption, or that may be impossible to immediately identify, does not appear to have been previously reported in the literature, and raises the question of how it may influence physicians’ willingness to use AI tools in their clinical practice. More research is needed to explore and identify secondary consequences of AI integration, including how clinicians’ judgments and behavior may be influenced by AI tools, and how the impact of AI integration may bleed over into other clinical activities that it was not initially expected to influence. Furthermore, as ethical guidance for clinical AI progresses, it should be noted that understanding clinical AI's risk/benefit profiles may be as much a matter of looking at specific tools as developing general guidance, and that safeguards will ultimately need to be tailored to these specific risk/benefit profiles.

Because of the lack of confidence in being able to predict or account for many of the effects of AI tools in medicine, participants in this study strongly advocated for clinical AI that serves as a supplement to physician expertise, not as a substitute. The involvement of a human professional who reviews the AI output was viewed as a key safeguard against potential risks and unintended consequences associated with medical AI. This finding adds to similar results from many other studies, which identified collaborations between physicians and AI that support the physician-patient relationship as the most desirable future for AI in medicine (Ahuja, 2019; Čartolovni et al., 2023; Nelson et al., 2020; Van Cauwenberge et al., 2022; Yang et al., 2022).

Participants in this study notably viewed the involvement of physicians as a way to protect against threats to patient autonomy that could emerge as a result of increased integration of AI into clinical care, which was identified as a significant “unintended” consequence (McDougall, 2019; Sauerbrei et al., 2023). They noted that, within the current shared decision-model common in medical practice, patients generally are able to express their autonomy in a variety of ways: by choosing their physicians, advocating for their health needs and concerns, sharing additional information or perspectives that may help their physicians to refine their advice (e.g. by prompting counterfactual reasoning), requesting second opinions, opting out of certain treatments or interventions, and working with their physicians to determine preferred treatment directions. Participants worried that clinical AI could diminish patient autonomy in these scenarios, as AI was viewed to be less flexible in its decision-making than physicians, and the possibility of single, widespread AI systems was seen as a limit to patient choice. Interestingly, while prior qualitative studies connected physicians’ preference for physician-AI collaboration to their desire to preserve their autonomy in clinical scenarios (Tanaka et al., 2023; Van Cauwenberge et al., 2022), physicians in this study qualified that they also desired to mitigate potential risks to patient autonomy.

Regulation and considerations for the ethical advancement of AI in medicine

Physicians and AI researchers in this study called for increased regulation of clinical AI, with physicians in particular referencing the role of the U.S. Food and Drug Administration (FDA). Researchers identified how regulation would increase trust in AI tools, supporting the argument made by Kerasidou et al. (2022) that strong legal and regulatory frameworks form the basis for medical AI from which trust could emerge (Kerasidou et al., 2022). Physicians agreed that FDA approval would increase their willingness to use AI tools, but also recognized the challenges of regulating AI and worried that over-regulation could stymie needed innovation. Calls for increased federal regulation of clinical AI are common, however, recent reviews have documented numerous deficiencies in the current FDA model for regulation of clinical devices, including the use of traditional 510(k) pathways that do not require clinical evidence for safety, effectiveness, or equity; the use of non-equivalent devices as predicates; and the limited number of clinical evaluations (Gerke, 2021; Lee et al., 2023; Parikh et al., 2019).

In recognition of these limitations in the extent and quality of regulation of clinical AI, caution should be applied to ensure that the broad agreement that ethical AI in medicine requires human (i.e. physician)-in-the-loop (Ahuja, 2019; Čartolovni et al., 2023; Yang et al., 2022) does not become used as an opportunity to place the onus of ethical use of AI solely on physicians. While physician judgment grounded in professionalism and facility with ethical principles is an extremely valuable resource for ensuring that clinical AI does not cause undue harm or burden to individual patients, relying on this alone in the absence of formal regulation or oversight is not sufficient to guarantee the ethical advancement of AI in medicine. Our study demonstrates that for clinical uses of AI, physicians, and sometimes even developers, are likely to appraise the acceptability of clinical AI tools using the four prima facie principles of medical ethics: respect for autonomy, beneficence, nonmaleficence, and justice (Gillon, 1994; Beauchamp and Childress, 1989). Mittlestadt presents an argument for why these principles alone are not a sufficient framework for ensuring ethical AI (Mittelstadt, 2019); however, in the absence of formal regulation or oversight that enforces an alternate framework, physicians who ultimately need to make ethical judgments regarding clinical AI will likely do so using the principles with which they are most familiar.

Conclusion

Existing ethics frameworks and principles for medical AI have initiated important conversations in this rapidly advancing field, yet these approaches have yet to be validated with empirical study of stakeholders' perspectives. In this open-ended qualitative study, physicians and AI researchers with experience developing or using clinical AI discussed their opinions and experiences regarding the ethical integration of AI in medicine. The utilization of a qualitative descriptive method and an open-ended approach encouraged participants to elevate concerns that they felt were the most important, versus responding to concerns that were defined a priori, and to offer in-depth examples and explanations supporting their views. As far as we have found, this is the first open-ended study to specifically query physicians and AI researchers who have experience developing or using AI in medicine about their ethical views regarding AI's clinical integration.

This study is limited by its small and narrow sample size. Notably, all physicians in this study had prior experience using or developing AI in medicine, and all AI researchers were based in academia. Therefore, their views are not representative of all physicians or developers. Furthermore, interviews were completed in 2021, prior to the more recent advancements in large language models and generative AI, so the responses should be understood in that context. Future research is needed to determine the appropriate amount of information that should be disclosed to patients’ regarding the use of AI in their care and how physicians can best communicate this information. Additional studies are also urgently needed to better understand how physician decision-making is influenced by increased information regarding clinical AI, and how physician training in AI affects their ability to understand, interpret, and apply AI results in clinical care.

Participants in this study questioned how much information and understanding is needed by both physicians and patients in order for AI to be ethically used in clinical practice, and agreed that increased information sharing or explainability may not necessarily yield better outcomes, especially if physicians are not adequately trained in how to interpret the outcomes and limitations of clinical AI systems. Participants agreed that physician-AI collaboration is the most favorable and likely future for AI in medicine, as this relationship allows for the opportunity to mitigate potential risks, including potential threats to patient autonomy, and to maintain important aspects of the therapeutic relationship. Both researchers and physicians advocated for more diligent and thoughtful regulation of clinical AI, but also recognized the challenges and potential trade-offs of increased regulation. As AI becomes increasingly integrated into clinical care, caution should be applied to ensure that, given the current state of under-regulation of AI tools and devices, clinical integration utilizes diverse stakeholder input and robust safeguards across the development continuum, from AI research up to the point of care.

Supplemental Material

sj-docx-1-bds-10.1177_20539517251343853 - Supplemental material for Information, collaboration, regulation: Physician and AI researcher views on ethical considerations in clinical AI integration

Supplemental material, sj-docx-1-bds-10.1177_20539517251343853 for Information, collaboration, regulation: Physician and AI researcher views on ethical considerations in clinical AI integration by Katie Ryan, Max Kasun, Laura W Roberts and Jane Paik Kim in Big Data & Society

Supplemental Material

sj-jpg-2-bds-10.1177_20539517251343853 - Supplemental material for Information, collaboration, regulation: Physician and AI researcher views on ethical considerations in clinical AI integration

Supplemental material, sj-jpg-2-bds-10.1177_20539517251343853 for Information, collaboration, regulation: Physician and AI researcher views on ethical considerations in clinical AI integration by Katie Ryan, Max Kasun, Laura W Roberts and Jane Paik Kim in Big Data & Society

Footnotes

Acknowledgements

The authors would like to thank Dr Laura Dunn, M.D., Kyle McKinley, M.P.H., M.F.A., and Jodi Paik, M.F.A. for their contributions to this project.

ORCID iDs

Katie Ryan

Max Kasun

Laura W Roberts

Jane Paik Kim

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Center for Advancing Translational Sciences [grant number R01-TR-003505].

Declaration of conflicting interests

The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Laura W Roberts is the Editor-in-Chief of the journal Academic Medicine. The other authors have no relationships to declare.

Data availability

The disaggregate data underlying this study are not available to protect the privacy and confidentiality of the participants interviewed.

Supplemental material

Supplemental material for this article is available online.

References

Ahuja

(2019) The impact of artificial intelligence in medicine on the future role of the physician. PeerJ 7: e7702.

Amann

Blasimme

Vayena

, et al. (2020) Explainability for artificial intelligence in healthcare: A multidisciplinary perspective. BMC Medical Informatics and Decision Making 20(1): 310.

Beauchamp

Childress

(1989) Principles of Biomedical Ethics, 3rd ed. New York: Oxford University Press.

Browne

Drage

McInerney

(2024) Tech workers’ perspectives on ethical issues in AI development: Foregrounding feminist approaches. Big Data & Society 11(1): 20539517231221780.

Buck

Doctor

Hennrich

, et al. (2022) General practitioners’ attitudes toward artificial intelligence-enabled systems: Interview study. Journal of Medical Internet Research 24(1): e28916.

Caballé-Cervigón

Castillo-Sequera

Gómez-Pulido

, et al. (2020) Machine learning applied to diagnosis of human diseases: A systematic review. Applied Sciences 10(15): 5135.

CARNEGIE CLASSIFICATION OF INSTITUTIONS OF HIGHER EDUCATION (n.d.) Basic Classification. Available at: https://carnegieclassifications.acenet.edu/carnegie-classification/classification-methodology/basic-classification/ (accessed 13 December 2023).

Čartolovni

Malešević

Poslon

(2023) Critical analysis of the AI impact on the patient–physician relationship: A multi-stakeholder qualitative study. Digital Health 9: 20552076231220833.

Char

Abràmoff

Feudtner

(2020) Identifying ethical considerations for machine learning healthcare applications. The American Journal of Bioethics 20(11): 7–17.

10.

Char

Shah

Magnus

(2018) Implementing machine learning in healthcare — addressing ethical challenges. New England Journal of Medicine 378(11): 981–983.

11.

Clement

Ren

Curley

(2021) Increasing System Transparency About Medical AI Recommendations May Not Improve Clinical Experts’ Decision Quality. 3961156, SSRN Scholarly Paper. Rochester, NY. Available at: https://papers.ssrn.com/abstract=3961156 (accessed 25 October 2023).

12.

Diprose

Buist

Hua

, et al. (2020) Physician understanding, explainability, and trust in a hypothetical machine learning risk calculator. Journal of the American Medical Informatics Association: JAMIA 27(4): 592–600.

13.

Dlugatch

Georgieva

Kerasidou

(2024) AI-driven decision support systems and epistemic reliance: A qualitative study on obstetricians’ and midwives’ perspectives on integrating AI-driven CTG into clinical decision making. BMC Medical Ethics 25(1): 6.

14.

Downe-Wamboldt

(1992) Content analysis: Method, applications, and issues. Health Care for Women International 13(3): 313–321.

15.

Gerke

(2021) Health AI for good rather than evil? The need for a new regulatory framework for AI-based medical devices. Yale Journal of Health Policy, Law and Ethics 20: 432.

16.

Gillon

(1994) Medical ethics: Four principles plus attention to scope. BMJ 309(6948): 184.

17.

Grunhut

Wyatt

Marques

(2021) Educating future physicians in artificial intelligence (AI): An integrative review and proposed changes. Journal of Medical Education and Curricular Development 8: 23821205211036836.

18.

Baxter

Jie

, et al. (2019) The practical implementation of artificial intelligence technologies in medicine. Nature Medicine 25(1): 30–36.

19.

Hsieh

H-F

Shannon

(2005) Three approaches to qualitative content analysis. Qualitative Health Research 15(9): 1277–1288.

20.

Iacobucci

(2017) Patient data were shared with Google on an “inappropriate legal basis,” says NHS Data Guardian. BMJ 357: j2439.

21.

Kasun

Ryan

Paik

, et al. (2023) Academic machine learning researchers’ ethical perspectives on algorithm development for health care: A qualitative study. Journal of the American Medical Informatics Association 31(3): 563–573.

22.

Kerasidou

(Xaroula), Kerasidou

Buscher

, et al. (2022) Before and beyond trust: Reliance in medical AI. Journal of Medical Ethics 48(11): 852–856.

23.

Kiener

(2021) Artificial intelligence in medicine and the disclosure of risks. AI & Society 36(3): 705–713.

24.

Kim

(2021) Letter to the editor: Machine learning and artificial intelligence in psychiatry: Balancing promise and reality. Journal of Psychiatric Research 136: 244–245.

25.

Kim

Ryan

Kasun

, et al. (2023) Physicians’ and machine learning researchers’ perspectives on ethical issues in the early development of clinical machine learning tools: Qualitative interview study. JMIR AI 2(1): e47449.

26.

Kundu

(2021) AI in medicine must be explainable. Nature Medicine 27(8): 1328.

27.

Lee

Moffett

Maliha

, et al. (2023) Analysis of devices authorized by the FDA for clinical decision support in critical care. JAMA Internal Medicine 183(12): 1399–1401.

28.

Liu

C-F

Chen

Z-C

Kuo

S-C

, et al. (2022) Does AI explainability affect physicians’ intention to use AI? International Journal of Medical Informatics 168: 104884.

29.

Markus

Kors

Rijnbeek

(2021) The role of explainability in creating trustworthy artificial intelligence for health care: A comprehensive survey of the terminology, design choices, and evaluation strategies. Journal of Biomedical Informatics 113: 103655.

30.

McCradden

Joshi

Anderson

, et al. (2023) A normative framework for artificial intelligence as a sociotechnical system in healthcare. Patterns (New York, N.Y.) 4(11): 100864.

31.

McDougall

(2019) Computer knows best? The need for value-flexibility in medical AI. Journal of Medical Ethics 45(3): 156–160.

32.

Mittelstadt

(2019) Principles alone cannot guarantee ethical AI. Nature Machine Intelligence 1(11): 501–507.

33.

Morley

Machado

CCV

Burr

, et al. (2020) The ethics of AI in health care: A mapping review. Social Science & Medicine 260: 113172.

34.

Murphy

Di Ruggiero

Upshur

, et al. (2021) Artificial intelligence for good health: A scoping review of the ethics literature. BMC Medical Ethics 22(1): 14.

35.

National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research (1979) The Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research. U.S. Department of Health and Human Services. https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/read-the-belmont-report/index.html

36.

Neergaard

Olesen

Andersen

, et al. (2009) Qualitative description—The poor cousin of health research? BMC Medical Research Methodology 9: 52.

37.

Nelson

Pérez-Chada

Creadore

, et al. (2020) Patient perspectives on the use of artificial intelligence for skin cancer screening: A qualitative study. JAMA Dermatology 156(5): 501.

38.

Noble

Smith

(2015) Issues of validity and reliability in qualitative research. Evidence-Based Nursing 18(2): 34–35.

39.

Obermeyer

Powers

Vogeli

, et al. (2019) Dissecting racial bias in an algorithm used to manage the health of populations. Science. Epub ahead of print 2019. DOI: https://doi.org/10.1126/science.aax2342.

40.

Palinkas

Horwitz

Green

, et al. (2015) Purposeful sampling for qualitative data collection and analysis in mixed method implementation research. Administration and Policy in Mental Health 42(5): 533–544.

41.

Parikh

Obermeyer

Navathe

(2019) Regulation of predictive analytics in medicine. Science 363(6429): 810–812.

42.

Prem

(2023) From ethical AI frameworks to tools: A review of approaches. AI and Ethics 3(3): 699–716.

43.

Price

Cohen

(2019) Privacy in the age of medical big data. Nature Medicine 25(1): 37–43.

44.

Racine

Cascio

(2020 Jan 2) The false dichotomy between empirical and normative bioethics. AJOB Empirical Bioethics 11(1): 5–7.

45.

Rajpurkar

Chen

Banerjee

, et al. (2022) AI in health and medicine. Nature Medicine 28(1): 31–38.

46.

Samhammer

Roller

Hummel

, et al. (2022) ‘Nothing works without the doctor:’ Physicians’ perception of clinical decision-making and artificial intelligence. Frontiers in Medicine 9: 1016366.

47.

Sand

Durán

Jongsma

(2022) Responsibility beyond design: Physicians’ requirements for ethical medical AI. Bioethics 36(2): 162–169.

48.

Sandelowski

(2000) Whatever happened to qualitative description? Research in Nursing & Health 23(4): 334–340.

49.

Sauerbrei

Kerasidou

Lucivero

, et al. (2023) The impact of artificial intelligence on the person-centred, doctor-patient relationship: Some problems and solutions. BMC Medical Informatics and Decision Making 23(1): 73.

50.

Saunders

Sim

Kingstone

, et al. (2018) Saturation in qualitative research: Exploring its conceptualization and operationalization. Quality & Quantity 52(4): 1893–1907.

51.

Shaw

Rudzicz

Jamieson

, et al. (2019) Artificial intelligence and the implementation challenge. Journal of Medical Internet Research 21(7): e13659.

52.

Stafford

Kellermann

Mossotto

, et al. (2020) A systematic review of the applications of artificial intelligence and machine learning in autoimmune diseases. npj Digital Medicine 3(1): 1–11.

53.

Sun

Zheng

Liu

(2023) Increasing clinical medical service satisfaction: An investigation into the impacts of Physicians’ use of clinical decision-making support AI on patients’ service satisfaction. International Journal of Medical Informatics 176: 105107.

54.

Tanaka

Matsumura

Bito

(2023) Roles and competencies of doctors in artificial intelligence implementation: Qualitative analysis through physician interviews. JMIR Formative Research 7: e46020.

55.

Tang

Fantus

(2023) Medical artificial intelligence ethics: A systematic review of empirical studies. Digital Health 9: 20552076231186064.

56.

Van Cauwenberge

Van Biesen

Decruyenaere

, et al. (2022) “Many roads lead to Rome and the artificial intelligence only shows me one road”: An interview study on physician attitudes regarding the implementation of computerised clinical decision support systems. BMC Medical Ethics 23(1): 50.

57.

Yang

Ene

Arabi Belaghi

, et al. (2022) Stakeholders’ perspectives on the future of artificial intelligence in radiology: A scoping review. European Radiology 32(3): 1477–1495.

58.

Zhang

Genc

Wang

, et al. (2021) Effect of AI explanations on human perceptions of patient-facing AI-powered healthcare systems. Journal of Medical Systems 45(6): 64.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.04 MB

0.79 MB

Information,collaboration,regulation: Physician and AI researcher views on ethical considerations in clinical AI integration

Abstract

Keywords

Introduction

Methods

Study design

Participants and procedures

Data coding

Data analysis

Ethics review

Results

Information sharing and understanding

Physician understanding of clinical AI

Determining the appropriate balance of information to communicate to patients

The role of explainability in clinical decision-making

Risks and unintended impacts of clinical AI

Challenges of predicting the impacts of clinical AI

Impacts on patient autonomy and the therapeutic relationship

Impacts relating to fairness and equity

Heterogeneity of risk of clinical AI

Safeguards for clinical AI

The relationship between physicians and AI

Increasing regulation while encouraging innovation

Discussion

The tension between information and understanding

Mitigating risks and unintended impacts through physician-AI collaboration

Regulation and considerations for the ethical advancement of AI in medicine

Conclusion

Supplemental Material

sj-docx-1-bds-10.1177_20539517251343853 - Supplemental material for Information, collaboration, regulation: Physician and AI researcher views on ethical considerations in clinical AI integration

Supplemental Material

sj-jpg-2-bds-10.1177_20539517251343853 - Supplemental material for Information, collaboration, regulation: Physician and AI researcher views on ethical considerations in clinical AI integration

Footnotes

Acknowledgements

ORCID iDs

Funding

Declaration of conflicting interests

Data availability

Supplemental material

References

Supplementary Material