Abstract
Due to the potential scope and impact of artificial intelligence’s (AI's) adoption in medicine, a comprehensive assessment of the potential ethical considerations arising during clinical integration is needed. Existing ethical frameworks and principles have aided in the identification of ethical considerations that may arise during the clinical integration of AI, however, our understanding of these considerations remains preliminary to the extent that it is not yet robustly informed by empirical research on key stakeholders’ experiences and perspectives. Utilizing a qualitative descriptive approach, we completed in-depth semi-structured interviews with physicians (n = 11) and AI researchers (n = 10) who had experience developing or using clinical AI regarding ethical considerations that they have perceived in relation to their work. An analysis of the interviews identified considerations related to information sharing and understanding, the risks and systemic impacts of clinical AI, and opportunities for safeguards. Physicians and AI researchers expressed questions relating to how much information and understanding is needed by both physicians and patients in order for AI to be ethically used in clinical practice, agreed that unintended impacts associated with clinical AI could pose threats to patient autonomy, and advocated for more diligent and thoughtful regulation of clinical AI innovation.
Introduction
Innovative technologies leveraging artificial intelligence (AI) offer immense promise for clinical care, from improving systems for prediction, diagnosis, and treatment to enhancing clinical workflows and decision support systems (Caballé-Cervigón et al., 2020; Rajpurkar et al., 2022; Stafford et al., 2020). Achieving robust ethics guidance and safeguards for implementing AI-based tools is widely recognized as critical in ensuring that these tools will be adopted safely and effectively in clinical settings (Morley et al., 2020). Early ethics work relating to AI in medicine has identified ethically relevant technical considerations, including issues related to generalizability and bias (Obermeyer et al., 2019), concerns regarding the privacy and security of systems that utilize and store health data (Iacobucci, 2017; Price and Cohen, 2019), and the importance of explainable AI (Amann et al., 2020); such findings have promoted frameworks for use in performing systematic ethical appraisals of the development process of clinical AI (Char et al., 2018, 2020). More recently, some ethical frameworks and principles have highlighted the merits of understanding clinical AI integration as a “sociotechnical” problem and, thus, of assessing novel tools within the context of their use and in relation to their possible impact on immediate stakeholders (McCradden et al., 2023; Sand et al., 2022).
The above frameworks have demonstrated value to identify and assess ethically consequential aspects of the integration of clinical AI. And yet more needs to be understood about these issues in order to operationalize AI in practice (Prem, 2023). Recent systematic reviews of ethics literature in clinical AI have identified a disconnection between high-level frameworks and principles (e.g. the Belmont principles: beneficence, non-maleficence, autonomy, and justice; National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, 1979) and the limited empirical ethics literature (Murphy et al., 2021; Tang et al., 2023). Indeed, more empirical research is essential to develop and operationalize high-level guidance (i.e. what “should” be the case) that is adequately attuned to real-world experience (i.e. what “is”) and can anticipate possible ethical tensions (Racine and Cascio, 2020). Due to the limited number of AI tools currently utilized in clinical practice (He et al., 2019; Shaw et al., 2019), such ethical frameworks have necessarily relied on assumptions about the risks and consequences of integrating AI in clinical care, and there has yet to be sufficient empirical or stakeholder-based research to test these assumptions. As noted by Murphy et al. in their 2021 scoping review of 103 ethics articles regarding AI in medicine, “Those leading the discussion on the ethics of AI in health seldom mentioned engagement with the end users and beneficiaries whose voices they were representing,” and “without better understanding the perspectives of end users, we risk confining the ethics discourse to the hypothetical, devoid of the realities of everyday life” (Murphy et al., 2021).
Recent qualitative studies analyzing clinician attitudes toward real and hypothetical implementations of clinical AI tools have begun to fill this gap in the literature and have been effective in providing an improved understanding of the ethical concerns raised in recent guidance, as well as in identifying novel concerns. Van Cauwenberge et al. (2022) and Tanaka et al. (2023) found that, in interviews relating to the incorporation of specific AI clinical decision support systems, physicians distinguished between parts of their job that they believed could and could not be effectively automated or replaced by AI, with these distinctions potentially based on aspects of their roles that they felt were crucial to their identities as physicians and human beings (Tanaka et al., 2023; Van Cauwenberge et al., 2022). In a broad interview study assessing GP's attitudes toward clinical AI, Buck et al. (2022) found concerns relating to the maintenance of the physician–patient relationship in the context of clinical AI, supporting Samhammer et al.'s 2022 findings in which physicians described their belief that future AI decision support systems will need to uphold the “classical strengths” of the medical profession, primarily the shared-decision making model (Buck et al., 2022; Samhammer et al., 2022). The Samhammer et al. study additionally found that physicians distinguish between transparency and explainability, with a preference for understanding the efficacy of a clinical AI tool over its inner workings, complementing similar findings by Dlugatch et al. (2024) (Dlugatch et al., 2024; Samhammer et al., 2022).
These findings demonstrate the effectiveness of a stakeholder approach in identifying and describing ethically salient issues in the integration of clinical AI, including considerations for the relationship between physicians and AI, the maintenance of the therapeutic relationship and shared decision-making, and the perceived value of explainable AI. However, these studies were not intended to directly assess ethical views, and often queried perspectives relating to specific real or hypothetical implementations of clinical AI. Furthermore, existing studies have primarily focused on physicians, and few have included other stakeholders, including AI researchers who design and develop such tools for use in medicine. Tech workers’ ethical views regarding AI have been qualitatively assessed more broadly (Browne et al., 2024), but Tang et al. (2023), in a systematic review of empirical studies in medical AI, found no studies focusing on AI researchers’ views (Tang et al., 2023). Prior studies by our team were among the first to examine AI researcher views regarding the algorithm development phase in medicine (Kasun et al., 2023; Kim et al., 2023), but not the integration phase. Notably, aside from our own work, no studies have directly queried both physicians and AI researchers about their ethical views, indicating that the full landscape of ethical considerations has yet to be empirically charted.
Due to the immense scope and impact of AI's adoption in medicine, a comprehensive assessment of the potential ethical issues arising during clinical integration is needed. While ethical principles and frameworks have provided a necessary first step in identifying such issues, exploration of stakeholder perspectives is critical to the ethical, and ultimately successful, adoption of AI in clinical practice. In light of the gap in knowledge stated above, we sought to understand the ethical implications associated with clinical AI integration as perceived by immediate stakeholders, namely, AI researchers and physicians. Utilizing an open-ended qualitative approach, we interviewed AI researchers and physicians who had experience developing or using clinical AI and who could therefore offer sufficiently informed views regarding its uptake.
Methods
Study design
The purpose of this interview study, which is part of a broader project (NCATS R01-TR-003505), was to describe the views of AI researchers and physicians regarding ethical considerations that they have encountered or anticipated in the development and application of AI in medicine (Kim, 2021). A qualitative descriptive approach was applied to the study design and data analysis, as this approach is predicated on the description and analysis of stakeholders’ experiences and perceptions using language and ideas that emerge directly from the stakeholders themselves (Neergaard et al., 2009; Sandelowski, 2000). Whereas other methods of qualitative research and analysis prioritize the development or advancement of theory, qualitative description is most conducive to the emergence of novel considerations that may not yet be fully understood and that are grounded in stakeholder experiences. Considering the limited amount of literature describing stakeholder perspectives relating to ethics in clinical AI, qualitative description was identified as the most appropriate method for analyzing the results of this study.
Participants and procedures
A purposeful sampling technique (Palinkas et al., 2015) was used to identify and recruit participants who were especially knowledgeable about the topic being studied, in this case, machine learning (ML) researchers who had experience developing AI or ML tools for use in medicine, and physicians who had experience developing or using such tools in their clinical practice. A total of 61 potential candidates (33 ML researchers and 28 physicians) from 10 U.S.-based R1 research universities were identified by consulting relevant literature and seeking recommendations from experts in the fields of AI and ML, medicine, and AI ethics (CARNEGIE CLASSIFICATION OF INSTITUTIONS OF HIGHER EDUCATION, n.d.).
Recruitment emails containing details about our project and an electronic interest form were sent to these 61 potential participants; 29 completed the interest form, of which 21 (10 AI researchers, 11 physicians) scheduled and completed an interview. Interviews continued until no new substantive themes emerged from additional data collection and content saturation was reached (Saunders et al., 2018). The final cohort included participants from 6 different U.S. R1 research universities. Participants in the AI researcher group, at minimum, held master's degrees and were working toward a Ph.D. in computer science or a related field and represented academic departments of biomedical informatics, engineering, and computer science; participants in the physician group held M.D.s and represented specialties of radiology, psychiatry, and surgery. The complete demographic information of participants can be found in the Supplemental Material.
Semi-structured interviews were used in this study to facilitate in-depth, open-ended discussion. The interview guide consisted of seven primary questions regarding the ethical considerations of AI in medicine; relevant follow-up questions were asked in response to participants’ replies to the primary questions. Interview questions were purposefully broad to allow participants the opportunity to address areas in clinical AI that they found the most ethically significant, as opposed to responding to topics or ideas defined a priori by our research team. This allowed participants the opportunity to identify ethical considerations which they may have experienced in their own work, but which have not yet been adequately described in the literature. Interviews were completed via Zoom between November 2020 and April 2021 and were conducted by one of our team's four trained interviewers. The interviews lasted 52 minutes, 6 seconds on average (ranging from 29 to 95 minutes) and were audio recorded for the purposes of transcribing.
Data coding
Interviews were transcribed and de-identified by members of our research team. Data analysis was completed using conventional qualitative content analysis, which involves breaking down transcribed data into descriptive units which are named and sorted based on their substantive content (Downe-Wamboldt, 1992; Hsieh and Shannon, 2005). This method allows for the development of codes and themes that are inductively derived from the data, versus applied based on a priori assumptions about the content of the data. As is common in qualitative research, we aimed to achieve consistency in our qualitative coding and analysis (Noble and Smith, 2015). The coding (i.e. the labeling of the data) and analysis (i.e. the sorting and describing of the data) processes each consisted of a series of steps which are depicted in the Supplemental Material.
An initial round of open coding was performed on each of the transcripts by two randomly assigned research team members, who independently identified substantive content from the interview text and suggested potential code labels and definitions descriptive of the content. Upon completion of a round of open coding, the full research team held five meetings to discuss the reviewed transcripts, as well as the rationale for suggested codes and definitions. In cases where individuals who had reviewed the same transcript had code labels or definitions that diverged, they collaboratively refined the code labels and definitions with input from the full team until agreement was reached. At the end of this series of meetings, the first draft of the codebook was established.
A round of intermediate coding was then performed in which a subset of transcripts was again reviewed by two randomly assigned research team members, this time using the draft of the codebook as a guide. Each team member labeled the content within their assigned transcripts using the code labels and definitions in the codebook, specifically highlighting content that they felt was not adequately described by any existing codes. Upon the completion of intermediate coding, the full research team met twice more to compare the coded units, refine the code names and definitions, and incorporate new codes and definitions as needed based on the results of intermediate coding. In cases where individuals who had reviewed the same transcript had coded the content differently, they described the rationale for their decision to the full team, who worked together to refine the code labels and definitions as needed to ensure consistent future coding. The result of these collaborative meetings was the final version of the codebook, which contained 30 codes and definitions describing categories that emerged from the content of the interviews, as well as example quotes for each code.
Interview transcripts and the codebook were uploaded to NVivo 20 for final coding, which was completed by a single author (KR) with a background in qualitative coding and analysis in order to ensure coding consistency across all transcripts. The final coded transcripts were reviewed by the principal investigator (JPK) to ensure that codes were applied thoroughly and consistently.
Data analysis
After final coding, the full research team performed qualitative content analysis on the data set (Downe-Wamboldt, 1992; Hsieh and Shannon, 2005), as depicted in the Supplemental Material. Upon an initial review of the 30 codes and their content, it became clear that participants had addressed topics relating to three distinct phases of AI development for medicine: the problem formulation phase (Phase 1), the algorithm development phase (Phase 2), and the clinical integration phase (Phase 3) (see Figure 1). In order to reflect the depth at which participants addressed these topics, the results presented in this manuscript describe findings from the analysis of codes related to Phase 3. The analysis of findings related to Phases 1 and 2 has been published in earlier reports (Kasun et al., 2023; Kim et al., 2023).

Inductive categories and codes derived from interviews with physicians and AI researchers. Adapted from: Kim JP, Ryan K, Kasun M, et al. (2023) Physicians’ and machine learning researchers’ perspectives on ethical issues in the early development of clinical machine learning tools: Qualitative interview study. JMIR AI 2(1): e47449. 1. These codes contained content that was related to multiple phase categories and are therefore listed more than once in this figure.
After the coded content had been sorted into phases, the full research team continued analysis in order to identify major categories describing the overarching themes that emerged within each phase. After major categories had been identified, the team continued to analyze and sort the data at a more granular level, identifying subthemes which described specific ideas and topics which pertained to the larger categories. The team endeavored to develop subthemes that were descriptive and inclusive of all related content, whether that content represented agreement amongst those who spoke about the topic or offered varied or diverging views. If a diverging view was expressed it was explicitly noted.
Ethics review
This study obtained human subjects research approval from the Institutional Review Board of Stanford University on November 16, 2020 (#58118). A PDF copy of the IRB-approved informed consent form was emailed to all potential participants; Participants provided verbal consent prior to the start of the interview. Participants were compensated in the form of a $150 electronic gift card.
Results
All 21 participants in this project raised considerations related to the integration of AI in medicine (Phase 3, as depicted in Figure 1). From the 15 inductive codes associated with Phase 3, 3 major categories and 9 subthemes emerged (see Figure 2). These categories and the related subthemes are described in detail in this section.

Major categories, affiliated codes, and related subthemes describing ethical considerations in the integration of clinical AI, as described by AI researchers and physicians.
Information sharing and understanding
Physician understanding of clinical AI
Physician interviewees in this study grappled with the question of what level of understanding they should have about an algorithm and its limitations in order to act as “conscientious users” [Table 1, Section 1: Quote 15.2]. They felt that their ability to assess and understand AI tools could influence their confidence in the device and willingness to incorporate it into clinical practice. As stated by one physician: “You’re asking clinicians to make decisions based on some kind of output that they may not necessarily fully understand, and that's confusing. I think that's not something that clinicians will be comfortable doing” [1, 1: 15.1]. Physicians and researchers commented that healthcare providers will need to receive training in how to interpret and incorporate the results of clinical AI, and how to identify its limitations [1, 1: 06, 20.1], with one participant summarizing that providers will need to learn “how to interpret algorithms, how to understand what AI does and does not do, and how to understand the inherent biases that are in AI so that they understand [not only] how the tool can work, but also the limits of it” [1, 1: 20.1]. Physicians and researchers agreed that physicians do not currently receive the training necessary to understand clinical AI tools sufficiently and make the requite judgments regarding how to apply them in patient care [1, 1: 10, 20.2, 22, 33].
Selected quotes from AI researchers and physicians regarding information sharing and understanding.
Determining the appropriate balance of information to communicate to patients
Physicians in this study elevated the importance of information sharing and the shared decision-making model. However, they struggled with the questions around the acceptable level and type(s) of information to share with patients regarding the AI used in their care. Several physicians referenced certain “practical” barriers that may limit providers’ ability to fully inform patients about the use of or risks associated with such tools, one of these being that the complexity of the underlying algorithms may make them challenging to explain during the informed consent process [Table 1, Section 2: Quotes 12, 14, 25.1]. These physicians agreed that, when it comes to clinical AI, it is especially easy to provide detail that is “exhausted to the point of being uninformative” [1, 2: 12], and questioned whether it is “beneficial to the patient's care to provide potentially extraneous information that might actually not help care and may even hinder care?” [1, 2: 15]. When discussing the appropriate level of information to share with patients regarding clinical AI, several physicians referenced how many processes in medicine, including those involving other types of medical devices or technology, are often not fully explained to patients [1, 2: 15, 25.2], but were hesitant to conclude whether or not clinical AI devices should be regarded similarly.
The role of explainability in clinical decision-making
Whereas AI researchers in this study agreed on the importance of prioritizing explainability while developing clinical AI (Kasun et al., 2023), physicians expressed more varied opinions regarding the role of explainability in AI's clinical implementation and use at the point of care. One physician described how explainability contributes to physicians’ ability to provide quality patient care and support robust shared decision-making, noting, “Patients are supposed to know why you’re making certain decisions. If you’re telling them, ‘I’m making a decision because that's what the algorithm said,’ I think it's questionable whether the patient has enough information in that scenario, because you can't explain why the algorithm made that decision” [Table 1, Section 3: Quote 22]. Others referenced how low-explainability algorithms could contribute to bias [1, 3: 20.1, 25]. One of these physicians, however, clarified that there are limits to the amount of explainability that they would find helpful, noting “I don’t need to know the specific data points… I want to be able to understand the result that it's giving and how I should factor that into my decision-making” [1, 3: 20.1]. One physician also questioned the assumption that increased explainability is always beneficial, asking “does transparency have a price?” in recognition of the possibility that providing physicians with more information about an AI tool's decision could have unexpected, and not necessarily positive, impacts on how they understand, view, or treat their patients [1, 3: 20.2].
Risks and unintended impacts of clinical AI
Challenges of predicting the impacts of clinical AI
Physicians and researchers in this study expressed concerns that the combined novelty, speed, and scale of AI innovation could entail new risks for patients, physicians, or healthcare systems that stakeholders may not necessarily be able to anticipate. One researcher described the evolution of AI and its accompanying risks as a “branching tree,” with each new innovation resulting in a host of possible ethical dilemmas and risks [Table 2, Section 1: Quote 13]. A physician highlighted the effect that the scale of AI integration could have on consequent risks, arguing, “The degree of ripple effects is directly correlated with the scale at which your solution is deployed… And with AI it's so easy, it's so cheap to have one product actually be used by a lot of people” [2, 1: 15]. Other participants expressed concerns about the possibility of clinical AI tools negatively impacting health outcomes, even when the tools perform as expected. A researcher argued that it is difficult to predict these secondary risks of AI tools, noting, “We can't just assume that giving people that score is going to lead to better outcomes. For particularly high risk [patients], it may lead to physicians intervening in more aggressive ways than they otherwise would and causing harm, or any number of different things” [2, 1: 18].
Selected quotes from AI researchers and physicians regarding risks and unintended impacts of clinical AI.
Impacts on patient autonomy and the therapeutic relationship
Physicians and AI researchers described aspects of patient autonomy and the shared decision making process that they felt could be impacted by the widespread use of clinical AI tools, such as the possibility of shifting attention away from interpersonal aspects of care, and limiting the set of alternatives in the decision making process. [Table 2, Section 2: Quotes 06.1, 15.1, 15.2, 26, 27] One physician summarized: “[Physicians] ultimately allow the patient autonomy, and I wonder if the AI would take away some of the patient's autonomy, because they're like, ‘this is the only right way,’ versus ‘these are the options which may not have the best outcomes, but are still options.'” [2, 2: 27] One researcher expressed a related concern that over-integration of an AI system could limit patients’ ability to seek a second opinion that is truly independent: “[Right now] you can just choose a different [physician] to talk to, whereas if there's only one AI, then what do you do?” [2, 2: 06.1] Underlying many of these comments was a concern about the potential impact that AI could have on the therapeutic relationship, which physicians did not feel could be substituted with technology: “There is this important part of a therapeutic relationship where you're just with someone… I think that patients really appreciate that and it just makes a difference” [2, 2: 15.2].
Other physicians and researchers discussed patient autonomy in the context of patients’ ability to decide when and how much AI should be incorporated in their medical care. A researcher asked, “How much agency do the people have in not having their data included… or not using the system at all?,” which resonated with a physician's question, “To what extent should patients be able to opt out of their provider having these digital tools running in the background?” [2, 2: 06.2, 20]. A different researcher anticipated that a possible ethical tension between beneficence and voluntarism could emerge in cases where an AI tool provides a clear and objective health benefit (i.e. more accurate health risk detection) that nonetheless stems from AI monitoring to which patients may not have consented, noting that “Sometimes [at-risk patients identified by AI] don't want to be detected” [2, 2: 10].
Impacts relating to fairness and equity
There was agreement among the physicians that clinical AI has the potential to benefit different patient populations disproportionately, building on points made by AI researchers when discussing the development phase (Kasun et al., 2023). Multiple physicians discussed the limited generalizability of datasets used in training AI systems, due to both differential access to health care, and use of technologies that collect such information. [Table 2, Section 3: Quotes 25.1, 27] One physician expressed worry about a future in which AI could potentially “create a different class,” where “some people only get the digital interventions and then the people that can afford it get the premium face to face” [2, 3: 25.2]. Despite these concerns, several physicians agreed that medical AI could potentially “enable broader access” to needed healthcare [2, 3: 23], and that even if AI provides a lower quality of care, “it probably is more beneficial to give a poorer [AI] therapy to a lot of people than a very good [in-person] therapy to a few people” [2, 3: 17].
Heterogeneity of risk of clinical AI
In general, physicians and researchers agreed that the risks and benefits of any AI tool are highly heterogeneous and dependent on the context in which it is used, with one researcher noting, “The ethical considerations are very context specific” [Table 2, Section 4: Quote 33], and a physician stating, “Each clinical problem has its own risk/benefit profile” [2, 4: 14]. Participants said that multiple aspects of clinical integration including its domain of use (e.g. prediction, diagnosis, or intervention), the risks accompanying the health concern, and the level of expert oversight are important considerations when assessing the risk/benefit profile of a medical AI tool. [2, 4: 13, 20, 25]
Safeguards for clinical AI
The relationship between physicians and AI
When discussing both the current state of and the future of AI in medicine, physicians and researchers noted how human moderation of AI tools is necessary for the acceptance and advancement of such tools. Physicians and AI researchers agreed that fully autonomous AI in medicine is not likely, with one researcher emphasizing the necessity of “leaving the doctor to make decisions is crucial and not having any part of the pipeline without a human involved” [Table 3, Section 1: Quote 06]. Applications that attempt to complete tasks end-to-end without clinician input were not expected to be successful, [3, 1: 05] in part because it was expected that “people will not relinquish that level of control and oversight to something that can kill people” [3, 1: 23.2].
Selected quotes from AI researchers and physicians regarding safeguards for clinical AI.
Participants in both groups stated that they expect that the most successful applications of AI in medicine to be ones that “change the everyday workflow of the clinician so they can focus on what's most important” [3, 1: 11]. Physicians expressed excitement about the possibilities of AI in medicine, with several referencing how physicians’ current strengths can be supplemented by AI and suggesting that “the human plus the AI is better than either one alone” [3, 1: 14.1, 20]. Several physicians viewed medical AI as the next step of technological advancement in medicine, which has historically incorporated new technologies in order to improve care delivery systems and outcomes, and which will continue to require the expertise of trained physicians to correctly operate and interpret. For instance, one radiologist observed that medicine has “been bringing high technology in service of patients for over a century, and [AI] is just going to be another one of those” [3, 1: 14.2].
Increasing regulation while encouraging innovation
Participants in both groups agreed that AI is currently under-regulated in medicine, but there was not consensus regarding how to enhance regulation while continuing to encourage innovation. Researchers who commented on this topic endorsed more robust regulation in order to improve trust in medical AI, commenting that a “top-down protocol…will be utterly necessary to build widespread trust” [Table 3, Section 2: Quote 10], and that “regulation is the only way” to build this trust [3, 2: 06]. Physicians considered medical AI tools to be medical devices requiring regulation, certification, or oversight [3, 2: 12.1, 14, 17], with one noting that their comfort level regarding the use of AI in patient care is dependent on regulatory approval: “I would not put anything on a patient's report… that would have potential risk if it didn't have some type of regulatory approval” [3, 2: 23.1]. While encouraging more formalized regulation of medical AI, physicians simultaneously worried whether “over regulation of the field might slow down the development of AI,” [3, 2: 23.2] which they agreed “would be bad for patients, ultimately” [3, 2: 12.2].
Discussion
In this open-ended qualitative study analyzing the ethical perspectives of physicians and AI researchers with experience developing or using AI in medicine, participants identified a series of considerations relating to clinical integration of AI. Three main findings, which are described in detail below, were identified from this analysis: (1) There were many open questions about how clinical AI may impact physician and patient decision-making, with specific questions related to how much either party needs to be informed of and understand to support its ethical use; (2) A collaborative physician-AI relationship was seen as the most acceptable path forward for the purposes of mitigating potential risks, anticipating possible systemic or otherwise unintended impacts, and promoting patient autonomy; (3) Increased regulation of clinical AI appeared likely to improve physician trust in the tools, but was thought to entail unique accompanying challenges, including potential impacts on innovation.
The tension between information and understanding
Physicians in this study cited how explainability could increase physicians’ comfort and confidence in using an AI tool, in part due to their desire to be able to fully explain and justify their decision-making process to patients. In their discussions, however, physicians valued the desire for access to information regarding a clinical AI tool's risks, benefits, and limitations over being able to pinpoint specific data points relating to AI decisions. Although explainable AI has been widely identified in the ethics literature as necessary to its adoption in medicine (Amann et al., 2020; Kiener, 2021; Kundu, 2021; Liu et al., 2022), numerous recent stakeholder studies have found physicians’ preference for contextual, model-agnostic information (Diprose et al., 2020; Dlugatch et al., 2024; Samhammer et al., 2022). These findings demonstrate that there are likely limits to the amount of explainability that physicians view as beneficial, and that they may prefer access to information that is directly relevant to their ability to provide patient care, such as the risks and benefits of any given tool, its performance in different subpopulations, and guidance regarding how to best incorporate the results into clinical practice.
Several empirical studies have found that, while explainable AI contributes to increased physician trust in the systems, clinical benefits of such transparency have yet to be demonstrated (Clement et al., 2021; Diprose et al., 2020; Markus et al., 2021). Participants in our study provided potential context for these prior findings in their discussions about physician understanding of AI. They acknowledged that increased explainability or additional information about clinical AI will be advantageous to the extent that physicians have the ability to effectively assess the merits and limitations of an AI tool and accurately interpret and apply the results. Moreover, they noted that physician training in how to analyze and interpret AI, or perhaps even the involvement of technical experts, will likely be needed. Samhammer et al. came to a similar conclusion in their 2022 study, in which they found that AI explanations should be interpreted in context and they rely on physicians to apply understanding of the system relative to their clinical goals (Samhammer et al., 2022). In conjunction with efforts to develop more explainable AI should be efforts to train physicians in how to assess and interpret such programs, as explainability or transparency without understanding is unlikely to be of benefit to physicians or patients. This is especially pertinent considering the limited amount of research that has been done to determine how to best incorporate AI in the medical curriculum (Grunhut et al., 2021).
Discussion regarding the tension between information sharing and understanding extended to patient care as well. Physicians were not clear as to how to best inform patients about the use of AI in their care, and whether or not that information was beneficial or even desired by patients. Recent studies have found that revealing information about an AI model's performance increases patients’ reported trust and perceived usefulness of the model, and that informing patients about the use of AI in their care may improve their satisfaction (Sun et al., 2023; Zhang et al., 2021). Physicians in our study, however, acknowledged issues related to patient understanding and healthcare logistics that they felt could limit their ability to effectively share this information with patients. They described how a practical limit to the level of information physicians share with patients exists for other technologies used in patient care. But were less sure whether AI was merely another case of using this professional discretion. Reconciling patient desire for information about clinical AI and physician ability to provide it in an effective manner that does not upend interrelated clinical processes is a crucial step toward ensuring the future ethical use of these tools.
Mitigating risks and unintended impacts through physician-AI collaboration
When discussing potential risks associated with integrating AI in medicine, participants in this study expressed notable concern regarding the possibility of impacts on healthcare systems, physicians, and patients that they considered to be unintended. These concerns seemed to be heightened because of the potential for wide deployment of AI tools; the many possibilities for “branching” or “rippling” effects; and the heterogeneity of medical AI applications, which entail risks that are specific to the tool, the characteristics of the target population, and the timing or type of the intervention. Participants’ wariness of possible unintended or unidentified risks that may develop as a systemic consequence of widespread adoption, or that may be impossible to immediately identify, does not appear to have been previously reported in the literature, and raises the question of how it may influence physicians’ willingness to use AI tools in their clinical practice. More research is needed to explore and identify secondary consequences of AI integration, including how clinicians’ judgments and behavior may be influenced by AI tools, and how the impact of AI integration may bleed over into other clinical activities that it was not initially expected to influence. Furthermore, as ethical guidance for clinical AI progresses, it should be noted that understanding clinical AI's risk/benefit profiles may be as much a matter of looking at specific tools as developing general guidance, and that safeguards will ultimately need to be tailored to these specific risk/benefit profiles.
Because of the lack of confidence in being able to predict or account for many of the effects of AI tools in medicine, participants in this study strongly advocated for clinical AI that serves as a supplement to physician expertise, not as a substitute. The involvement of a human professional who reviews the AI output was viewed as a key safeguard against potential risks and unintended consequences associated with medical AI. This finding adds to similar results from many other studies, which identified collaborations between physicians and AI that support the physician-patient relationship as the most desirable future for AI in medicine (Ahuja, 2019; Čartolovni et al., 2023; Nelson et al., 2020; Van Cauwenberge et al., 2022; Yang et al., 2022).
Participants in this study notably viewed the involvement of physicians as a way to protect against threats to patient autonomy that could emerge as a result of increased integration of AI into clinical care, which was identified as a significant “unintended” consequence (McDougall, 2019; Sauerbrei et al., 2023). They noted that, within the current shared decision-model common in medical practice, patients generally are able to express their autonomy in a variety of ways: by choosing their physicians, advocating for their health needs and concerns, sharing additional information or perspectives that may help their physicians to refine their advice (e.g. by prompting counterfactual reasoning), requesting second opinions, opting out of certain treatments or interventions, and working with their physicians to determine preferred treatment directions. Participants worried that clinical AI could diminish patient autonomy in these scenarios, as AI was viewed to be less flexible in its decision-making than physicians, and the possibility of single, widespread AI systems was seen as a limit to patient choice. Interestingly, while prior qualitative studies connected physicians’ preference for physician-AI collaboration to their desire to preserve their autonomy in clinical scenarios (Tanaka et al., 2023; Van Cauwenberge et al., 2022), physicians in this study qualified that they also desired to mitigate potential risks to patient autonomy.
Regulation and considerations for the ethical advancement of AI in medicine
Physicians and AI researchers in this study called for increased regulation of clinical AI, with physicians in particular referencing the role of the U.S. Food and Drug Administration (FDA). Researchers identified how regulation would increase trust in AI tools, supporting the argument made by Kerasidou et al. (2022) that strong legal and regulatory frameworks form the basis for medical AI from which trust could emerge (Kerasidou et al., 2022). Physicians agreed that FDA approval would increase their willingness to use AI tools, but also recognized the challenges of regulating AI and worried that over-regulation could stymie needed innovation. Calls for increased federal regulation of clinical AI are common, however, recent reviews have documented numerous deficiencies in the current FDA model for regulation of clinical devices, including the use of traditional 510(k) pathways that do not require clinical evidence for safety, effectiveness, or equity; the use of non-equivalent devices as predicates; and the limited number of clinical evaluations (Gerke, 2021; Lee et al., 2023; Parikh et al., 2019).
In recognition of these limitations in the extent and quality of regulation of clinical AI, caution should be applied to ensure that the broad agreement that ethical AI in medicine requires human (i.e. physician)-in-the-loop (Ahuja, 2019; Čartolovni et al., 2023; Yang et al., 2022) does not become used as an opportunity to place the onus of ethical use of AI solely on physicians. While physician judgment grounded in professionalism and facility with ethical principles is an extremely valuable resource for ensuring that clinical AI does not cause undue harm or burden to individual patients, relying on this alone in the absence of formal regulation or oversight is not sufficient to guarantee the ethical advancement of AI in medicine. Our study demonstrates that for clinical uses of AI, physicians, and sometimes even developers, are likely to appraise the acceptability of clinical AI tools using the four prima facie principles of medical ethics: respect for autonomy, beneficence, nonmaleficence, and justice (Gillon, 1994; Beauchamp and Childress, 1989). Mittlestadt presents an argument for why these principles alone are not a sufficient framework for ensuring ethical AI (Mittelstadt, 2019); however, in the absence of formal regulation or oversight that enforces an alternate framework, physicians who ultimately need to make ethical judgments regarding clinical AI will likely do so using the principles with which they are most familiar.
Conclusion
Existing ethics frameworks and principles for medical AI have initiated important conversations in this rapidly advancing field, yet these approaches have yet to be validated with empirical study of stakeholders' perspectives. In this open-ended qualitative study, physicians and AI researchers with experience developing or using clinical AI discussed their opinions and experiences regarding the ethical integration of AI in medicine. The utilization of a qualitative descriptive method and an open-ended approach encouraged participants to elevate concerns that they felt were the most important, versus responding to concerns that were defined a priori, and to offer in-depth examples and explanations supporting their views. As far as we have found, this is the first open-ended study to specifically query physicians and AI researchers who have experience developing or using AI in medicine about their ethical views regarding AI's clinical integration.
This study is limited by its small and narrow sample size. Notably, all physicians in this study had prior experience using or developing AI in medicine, and all AI researchers were based in academia. Therefore, their views are not representative of all physicians or developers. Furthermore, interviews were completed in 2021, prior to the more recent advancements in large language models and generative AI, so the responses should be understood in that context. Future research is needed to determine the appropriate amount of information that should be disclosed to patients’ regarding the use of AI in their care and how physicians can best communicate this information. Additional studies are also urgently needed to better understand how physician decision-making is influenced by increased information regarding clinical AI, and how physician training in AI affects their ability to understand, interpret, and apply AI results in clinical care.
Participants in this study questioned how much information and understanding is needed by both physicians and patients in order for AI to be ethically used in clinical practice, and agreed that increased information sharing or explainability may not necessarily yield better outcomes, especially if physicians are not adequately trained in how to interpret the outcomes and limitations of clinical AI systems. Participants agreed that physician-AI collaboration is the most favorable and likely future for AI in medicine, as this relationship allows for the opportunity to mitigate potential risks, including potential threats to patient autonomy, and to maintain important aspects of the therapeutic relationship. Both researchers and physicians advocated for more diligent and thoughtful regulation of clinical AI, but also recognized the challenges and potential trade-offs of increased regulation. As AI becomes increasingly integrated into clinical care, caution should be applied to ensure that, given the current state of under-regulation of AI tools and devices, clinical integration utilizes diverse stakeholder input and robust safeguards across the development continuum, from AI research up to the point of care.
Supplemental Material
sj-docx-1-bds-10.1177_20539517251343853 - Supplemental material for Information, collaboration, regulation: Physician and AI researcher views on ethical considerations in clinical AI integration
Supplemental material, sj-docx-1-bds-10.1177_20539517251343853 for Information, collaboration, regulation: Physician and AI researcher views on ethical considerations in clinical AI integration by Katie Ryan, Max Kasun, Laura W Roberts and Jane Paik Kim in Big Data & Society
Supplemental Material
sj-jpg-2-bds-10.1177_20539517251343853 - Supplemental material for Information, collaboration, regulation: Physician and AI researcher views on ethical considerations in clinical AI integration
Supplemental material, sj-jpg-2-bds-10.1177_20539517251343853 for Information, collaboration, regulation: Physician and AI researcher views on ethical considerations in clinical AI integration by Katie Ryan, Max Kasun, Laura W Roberts and Jane Paik Kim in Big Data & Society
Footnotes
Acknowledgements
The authors would like to thank Dr Laura Dunn, M.D., Kyle McKinley, M.P.H., M.F.A., and Jodi Paik, M.F.A. for their contributions to this project.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Center for Advancing Translational Sciences [grant number R01-TR-003505].
Declaration of conflicting interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Laura W Roberts is the Editor-in-Chief of the journal Academic Medicine. The other authors have no relationships to declare.
Data availability
The disaggregate data underlying this study are not available to protect the privacy and confidentiality of the participants interviewed.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
