Sage Journals: Discover world-class research

Abstract

The use of artificial intelligence (AI) systems in healthcare provides a compelling case for a re-examination of ‘gross negligence’ as the basis for criminal liability. AI is a smart agency, often using self-learning architectures, with the capacity to make autonomous decisions. Healthcare practitioners (HCPs) will remain responsible for validating AI recommendations but will have to contend with challenges such as automation bias, the unexplainable nature of AI decisions, and an epistemic dilemma when clinicians and systems disagree. AI decisions are the result of long chains of sociotechnical complexity with the capacity for undetectable errors to be baked into systems, which introduces a new dimension of moral luck. The ‘advisory’ nature of AI decisions constructs a legal fiction, which may leave HCPs unjustly exposed to the legal and moral consequences when systems fail. On balance, these novel challenges point towards a legal test of subjective recklessness as the better option: it is practically necessary; falls within the historic range of the offence; and offers clarity, coherence, and a welcome reconnection with ethics.

Keywords

Gross negligence manslaughter artificial intelligence medical error responsibility negligence

Introduction

The National Health Service (NHS) is under immense financial pressure and clinicians and the most senior management have warned about the risk this presents to patient safety.¹ At the same time, some legal responses to errors have been accused, by the medical profession, of being uncompromising and lacking in appreciation of context.² The continuing use of the criminal law is hard to reconcile with current policy, which aims to foster an open and transparent culture of reporting accidents and learning lessons.³ This tension reached a new level of intensity with the General Medical Council’s decision in 2018 to pursue an appeal to the High Court to strike off Dr Bawa-Garba following her conviction for gross negligence manslaughter (GNM).⁴

To briefly contextualise the law in the jurisdiction of England and Wales, GNM is a common law offence, which is committed when the prosecution can prove that the defendant owed a duty of care to the victim, that they breached that duty of care by their conduct, and that this breach caused the victim’s death. In addition, the prosecution must prove that the defendant’s conduct was so grossly negligent that it amounts to a criminal offence.⁵ In the House of Lords case of R v Adomako,⁶ which is still the leading authority, the court held that the determination of ‘grossness’ was largely a matter for the jury.⁷ Subsequent cases in the Court of Appeal have tried to capture the threshold in more precise language: perhaps most notably Leveson J, in both R v Sellu⁸ and R v Rose⁹ (and now adopted in R v Broughton¹⁰), described ‘gross’ as ‘truly exceptionally bad and was such a departure from the standard [of a reasonably competent doctor] that it consequently amounted to being criminal.’¹¹

The storm around the case of Dr Bawa-Garba had been gathering over a number of years, largely as a result of an increase in referrals to the Crown Prosecution Service (CPS),¹² increased prosecutions,¹³ an arguably lower threshold for initiating prosecutions, with tougher sentences,¹⁴ and professional sanctions.¹⁵ At the same time, the public, police, and sections of the media continue to be guided by the belief that healthcare practitioners (HCPs) can be punished into safer care.¹⁶ This article makes no claims about the appropriate level of prosecutions, but the flurry of medical manslaughter cases during the 2010s only represents the tip of the iceberg; interviews under caution and referrals to the CPS cause immense stress to HCPs: the very low likelihood of an eventual conviction, or the prospect of a successful appeal may offer little comfort to clinicians. As Quick argues, ‘the process is the punishment.’¹⁷ The medical profession has become increasingly worried, and as noted by Prof Don Berwick, ‘fear is toxic’ for clinicians and their patients.¹⁸ As a result, The Williams Review and Hamilton Review have both called for fairer systems and procedures around the use of the criminal law so that HCPs may operate ‘without fear of retribution’.¹⁹ It remains unclear whether the fears from a couple of years ago are starting to recede, partially as a result of the perplexing developments within recent case law.

The introduction of artificial intelligence (AI) systems presents a novel perspective on this familiar reform issue: the mens rea of involuntary manslaughter in the medical context. Medical errors have played an oversized role in shaping the law of manslaughter,²⁰ in historically defining the lower threshold for culpable homicide in England and Wales. A frequent criticism of GNM is that inadvertent conduct can result in the conviction of a very serious offence without recklessness. This fundamental criticism is often aimed at its application in the safety-critical healthcare context, in particular, as this is where errors occur in the course of professional duties attending to critically ill patients. While there should never be peanuts in nut-free products,²¹ for example, healthcare is more complex, and it is impossible to claim that fatal errors can be completely eliminated.

The first four sections of this article introduce and describe the inherent novel challenges of advisory AI systems and the implications for medical practice, introducing an argument that the dangers of automation bias (AB) and the potential jury approach to AI-induced errors may present a risk to clinicians. The remainder of the article then examines GNM in the jurisdiction of England and Wales and demonstrates that the coming AI healthcare revolution makes a compelling case for a reconsideration of ‘gross negligence’ as the basis for criminal liability, and that the appropriate legal test should be subjective recklessness. A proposed shift to subjective recklessness is already widely advocated for within the legal literature and the introduction of AI systems adds a new dimension and urgency to extant arguments.²² If, as anticipated, AI systems become more involved in a greater proportion of clinical decisions, the need for reform may be difficult to ignore.

AI in healthcare

AI is not a new technology and has endured a fitful history since the term was first coined in 1956.²³ Development has ebbed and flowed between modest steps forward and periods of inertia, often referred to as AI winters.²⁴ During the last decade, substantial progress has been made and AI has benefitted from a long summer, facilitated by the convergence of the ever-expanding availability of big data,²⁵ the unprecedented speed and reach of cloud computing platforms, and the innovation of increasingly sophisticated machine learning algorithms. There is no universal definition of AI, which is best described as a portfolio of technologies, or a growing family. While many definitions exist within the literature, most now recognise that AI is a novel form of agency with capacity to learn from data; it may be embodied within a physical device, or as software that is instantiated within a system.²⁶

Investment has flowed into the sector and the UK government has invested over 250 million into AI in healthcare.²⁷ This enthusiasm among investors and policymakers is built on the hope that these new AI technologies can radically transform patient care, as systems across the globe are struggling with increased costs and decreased outcomes.²⁸ The mounting pressures of the pandemic are also likely to accelerate the aims to incorporate AI solutions. However, the twin benefits of saving costs and improving outcomes for patients are distinct challenges which can be viewed from separate levels of abstraction.²⁹

Saving costs is a system goal at institutional and sectoral level. For example, cost savings may be possible by transformative cooperation between machines and doctors³⁰ with systems to support clinicians, monitor patients, and automate labour-intensive processes. Recent publications like the AOMRC report³¹ and the Topol review³² set out many possible efficiency benefits for the workforce in healthcare: research suggests that more than half of the clinical workforce will be routinely using AI predictive analytics, image interpretation, and natural language processing within a decade.³³

At the individual HCP level of abstraction, AI systems may also reduce medical errors by providing reliable advice on issues such as diagnosis, treatment choice, and treatment or care planning. For example, image recognition algorithms have demonstrated potential in interpretation of head computed tomography scans,³⁴ as well as in the diagnosis of malignant tumours in breasts,³⁵ lungs,³⁶ skin,³⁷ and brain.³⁸ Decision-making with support from AI systems has the potential to improve the performance of even experienced radiologists in diagnosing lung cancers.³⁹ Avoiding common errors may become a ‘core competency’ of AI, as it can putatively avoid inattention, fatigue, and cognitive biases.⁴⁰ However, claims that AI can reduce individual errors are not guaranteed and there is tension between this utopian vision of error-free healthcare and the economic objective of driving down costs: if responsibility attribution is unjust, then a defensive approach to AI by clinicians may drive costs in the wrong direction. The long-standing claims around the negative impacts of liability are difficult to evidence and quantify and may have developed into a ‘jaded cliché’,⁴¹ but the reality is complicated because the criminal law is one of a myriad of relevant factors. There are concerns that defensive practice may harm patients through over-testing, over-diagnosis, and wasting of scarce resources.⁴² On the other hand, there is a perspective increasingly held by the civil courts that legal liability may be standard enhancing which could eventually reduce recriminations and litigation.⁴³ However, the criminal law is the highest form of moral condemnation within society, and it is unrealistic to expect that there will not be consequential changes to medical practice and underlying clinician behaviour where it is invoked.

AI systems are likely to be introduced when they can consistently outperform HCPs in a given task;⁴⁴ indeed, when a system surpasses the abilities of human HCPs it may be unethical not to use it. It is likely to become the standard of care.⁴⁵ In the areas where machines can outperform HCPs, progress is already under way in establishing pathways to clinical use.⁴⁶ This means that AI systems will be introduced when they make fewer errors: not when they are infallible. If an AI system can be correct 97% of the time and a human HCP has a 94% accuracy rate, then the rate of misdiagnosis should fall, and lives will be saved. However, AI systems will continue to make unexpected errors that may lead to fatalities.

The types of AI examined in this article are considered advisory systems.⁴⁷ They will not be legally responsible for making decisions. There may be scope in future to introduce closed loops where decisions are made independently by machines,⁴⁸ but at present an AI system will advise the HCP to take a particular action and then the human will remain responsible for the implementation of the care. Under current EU Law, allowing AI systems full autonomy over healthcare decisions would not be permitted.⁴⁹ However, this interface between man and machine bears further scrutiny. The argument that the systems are ‘advisory’ does not necessarily reflect the way that they will operate in reality. In the subsequent sections of this article I will argue that AB, the unexplainable nature of AI decisions, and the subsequent epistemic vices construct a legal fiction where HCPs may be unjustly exposed to criminal liability for AI errors.

Unexplainable decisions

At this juncture it is important to consider the first key challenge of many AI systems: they make predictions but do not give explanations. Historically, there have been more primitive uses of rules-based AI in devices such as electrocardiograph machines, or defibrillators. These devices had to be explicitly programmed with prescriptive rules, which limited the potential complexity of any given system to a series of specific commands, sometimes referred to as ‘good old-fashioned AI’. The novel challenges discussed in this article arise through the introduction of more complex machine learning algorithms. Machine learning is not a new AI development; the term was coined by A.L. Samuel in 1959 and defined as the ‘field of study that gives computers the ability to learn without being explicitly programmed.’⁵⁰ There are several different methods, but all require high-quality data to perform well. Machine learning systems are already embedded into our social reality: powering smart assistants such as Siri and Alexa, detecting spam emails, and curating social media feeds.

Currently, advanced neural networks such as deep-learning techniques have made the most significant breakthroughs because of this versatility in being able to learn from raw data without the need to encode task-specific knowledge. However, the technology comes at a cost in that the systems are intrinsically opaque and are often referred to as a ‘black boxes’. For example, a diagnostic machine could be trained on millions of scans that show abnormalities and millions that do not, and learn to categorise the scans, examining them pixel by pixel, with exceptional precision where the data are accurate. However, there is a profound challenge in terms of explicability with these systems. The name ‘neural network’ refers to metaphorical way that the processes simulate the neurons of the human brain. The term ‘deep’ refers to the many layers of functions within the system. These mathematical functions cascade through the various layers, adjusting parameters allowing the system to learn from prior outputs and predictions. Explanations of this process become so abstract and mathematically complex that they cannot feasibly be understood in ordinary language, meaning it is not generally possible to understand why a particular result has been achieved. From a criminal liability perspective, this creates a problem for the HCP as the ‘advisory’ nature of the system comes on a ‘take it or leave it’ basis.

The contention that AI is entirely advisory does not necessarily reflect the practical reality of a healthcare setting. It is well established that there are cognitive biases such as AB when decision-support systems are used. The literature shows that AB is prevalent in medical decision-making generally and cannot be reliably removed. Factors such as the complexity of tasks and decision-making under time constraints make AB more likely to occur. Goddard and colleagues describe AB as the process:

by which users tend to over-accept computer output ‘as a heuristic replacement of vigilant information seeking and processing’. AB manifests in errors of commission (following incorrect advice) and omission (failing to act because of not being prompted to do so) when using CDSS [Clinical Decision Support Systems].⁵¹

It is important to acknowledge the competing justifications for introducing AI systems and the reason that policy makers are so keen to make large financial commitments. The AI will be deployed to support inexperienced HCPs where more experienced consultants may not be available within healthcare resource constraints. Certain groups of clinicians who perform higher risk work are always more at risk of prosecution⁵² and the same frontline practitioners may soon be relying on AI systems for decision-support.

To take a hypothetical example, a junior doctor in a highly pressurised Accident and Emergency Department may be treating a recently admitted patient. They follow the correct professional procedure and use an AI system for diagnostic and treatment advice. The system gives a high probability of condition A and treatment recommendations, and low probability of condition B and an even lower probability for a range of other conditions. The doctor is aware that the system has been tested and has a higher diagnostic accuracy than an experienced specialist. They may be uncertain of the diagnosis and may ordinarily have had to double-check with a consultant prior to the implementation of an AI system. The doctor then decides to commence the treatment and move to another patient. However, in this scenario one of the more unlikely scenarios occurs and the patient deteriorates and dies.

In this hypothetical scenario it is legally correct to claim that the decision was made (or validated) by the junior doctor and the substantive facts could make the error appear ‘truly exceptionally bad’.⁵³ To hold the doctor solely responsible is to maintain a legal fiction because the criminal act will involve the validation of decision that is uninterpretable. The ability to critically examine the methodology of the AI decision is fundamentally impossible with ‘black boxes’. The error may appear serious and obvious in an ex post examination of the error, from a complete informational framework and free from the effects of AB. The juror, with the benefit of hindsight, may consider that uncritically following an erroneous AI recommendation is worthy of punishment.

One possible way to mitigate this risk is that the HCP could exclude all possible conditions listed by performing every perceivable diagnostic test to compensate for the explainability problem; however, this will undermine the system-level objectives of the AI and increase HCP workload.⁵⁴ Proactively carrying out unnecessary and potentially invasive investigations for conditions that have only a marginal probability will render the value of the diagnostic advice irrelevant.

The lack of visibility into the AI decision-making process has led many to argue that ‘black box systems’ should not be used in high-stakes environments like healthcare.⁵⁵ In healthcare, it is generally critical to understand the processes behind decisions so that systems can be improved where errors do occur,⁵⁶ but decisions made by AI systems are incapable of creating epistemic value in this respect. Therefore, some argue that AI in healthcare should be restricted to more ‘interpretable’ models.⁵⁷ However, ‘interpretability’ is a woolly concept with ‘inconsistently applied terminology’⁵⁸ and ‘motives for interpretability and the technical descriptions of interpretable models are diverse and occasionally discordant’.⁵⁹

It is possible to reduce the complexity of the systems in a way that makes an explanation possible,⁶⁰ but this will invariably make the system less effective. Demanding explainable AI in healthcare may mean foregoing the benefits of deep-learning techniques altogether, so the opacity problems are likely to remain. It raises the ethical question of: ‘How much are we willing to lose in prediction accuracy to gain any form of interpretability?’⁶¹ There is intense academic interest in creating a form of explainable AI. Watson and colleagues note that ‘explanatory breakthroughs have been few and far between’.⁶² There have been attempts to create AI algorithms that generate explanations by showing the relevant part of a scan that the system has used to make a decision: one AI guesses what another AI is looking at⁶³ but so far this has not been successful in healthcare.⁶⁴ The hope of explainable AI may prove to be a fool’s gold or ‘false hope’; however, even if post hoc generative explanations are used, it does not offer a practical solution because it is likely to increase HCP confidence in a less reliable machine that they are still ultimately responsible for.

HCPs working in conditions that leave them exposed to psychological factors could be particularly susceptible to prosecution. Psychological factors can significantly undermine the capacity to avoid harm. Merry and McCall Smith illustrate the issue of ‘mind-set’⁶⁵ where a professional will see what they expect to see and do not notice a deviation in repeated tasks. Clinicians are highly likely to see what the AI is advising them to see when they become used to following accurate advice. Gooderham and Toft describe this psychological phenomenon as ‘involuntary automaticity’ and explain that it occurs when a person sees ‘what they expect to see rather than what is actually present’ and describe it as a ‘potent source of medical error’.⁶⁶ Gooderham and Toft argue that even skilled professionals may become captured by involuntary automaticity and that causes them ‘to act on an unconscious and involuntary basis’.⁶⁷ This presents a significant danger because judicial commentary has eschewed subjective fault element in favour of the putative reasonable doctor where the conduct may be criminal ‘if you think he did something that no reasonable doctor would have done’.⁶⁸ If relevant legal tests do not adequately account for the subjective epistemic condition of the clinician, then courts are likely to assert that there is no requirement to consider the defendant’s subjective state of mind, which is likely to overlook the psychological dimensions to AI-induced error.⁶⁹

Epistemic vices

The next section examines what is likely to be a common problem at the heart of the human–machine interface: the dilemma that is revealed when the problem of clinician–AI disagreement arises. What happens when a doctor is unsure about an AI recommendation and believes that it may be making a medical error?

There are many research articles which compare human performance against the performance of machine learning systems, but they often involve a system classifying images alongside a human who is given relatively little time and the results are judged on a one-off decision.⁷⁰ This reveals a distinct difference in approach, and machine learning systems may shift clinicians towards more instantaneous decision-making, which the literature has established is a particularly hazardous trait. Both the HCP and the AI system are experts but work in fundamentally different ways. For example, machine learning systems may classify an image by analysing it pixel by pixel and comparing it with thousands of other data sets in a highly mathematical process. Humans would use a more heuristic process involving their experience and judgement. The resulting effect is that both AI and humans will make errors, but they will make different types of errors.

A simple answer to the explainability problem is to accept that the HCP may always follow the system advice because they are entitled to accept that it is accurate: that ‘overruling the advice of the AI system may phenomenologically be experienced as similar to overruling advice given over the phone by a senior colleague’.⁷¹ A crucial distinction when introducing the criminal law is that individual liability can be transferred to another human agent: the senior colleague may become individually legally responsible for this advice in a way that an AI system cannot. Erroneous advice from a senior college may be exculpatory for gross negligence and the case law supports this.⁷² However, when AI systems are classed as advisory, this will not be the case.

One solution to this challenge would be for a junior doctor to always seek senior-clinician advice and double-check every time they do not unequivocally agree with the AI; however, as stated, this would fundamentally undermine the system-level goals by increasing costs and raising concerns about defensive medical practice.

A second option would be to accept that, when following the AI recommendation, a clinician could never be criminally liable. This option is not likely to materialise because of the way that machine learning works. There is an abundance of evidence in the literature that machine learning systems can make unexpected errors that may look obvious to a human HCP⁷³ without the effects of AB. There are many notorious examples of spurious methods of classification, such as an image recognition algorithm that differentiated between wolves and huskies by detecting snow in pictures,⁷⁴ or a cancer detection algorithm may learn that higher quality scans indicate cancer.⁷⁵ However effective, the AI is not using the same evaluative process as the human HCP.

To take an example of the kind of error envisioned in this article, a machine learning algorithm was used to predict the probability of death among hospital patients with pneumonia.⁷⁶ Patients with asthma were systematically classified as lower risk by the algorithm than other patients. This computational determination was fundamentally flawed because asthma patients within the historical data were routinely admitted straight to the intensive care unit where the continuous intensive treatment improved their prognosis, thereby making it appear that they presented less risk. The different clinical pathway skewed the output of the computational model and shows the potential for similar errors: many recurrent error traps may be laid within spurious correlations.

It is the nature of AI mistakes that presents a danger to clinicians, not simply that they occur. It is the fact that AI systems can be remarkable and ridiculous, with aspects that are both superhuman and subhuman. Markus and Ernest describe AI systems as ‘digital idiots savants’;⁷⁷ if they are right, it presents obvious dangers when introduced into frontline care. Where HCPs validate unexpected errors, it could be ‘truly exceptionally bad’⁷⁸ and hindsight may suggest that a clear and obvious risk of death was present. If an asthma patient died shortly after being discharged as low risk, it is highly plausible that a criminal complaint may be made. Where other HCPs pick up a potential system error, it may look damning to the HCP who does not.

HCPs therefore will remain duty-bound to carefully consider that the advice that they are given is valid and appropriate.⁷⁹ As Pasquale observes ‘there will always be a place for domain experts to assess the accuracy of the AI advice and check with author assess how well it works in the real world.’⁸⁰ Since there could be circumstances where an AI recommendation is obviously flawed, it presents a challenge for HCPs: they will be required to act as the ‘common-sense filter’. They will do so under the effects of existing system pressures and the psychological factors caused by AI systems.

HCPs therefore must consider that many of the recommended decisions could be either highly counter-intuitive insights⁸¹ or the type of brittle errors that occurred with the pneumonia algorithm.⁸² Any recommendation that challenges the HCP’s own diagnosis creates this epistemic dilemma. HCPs are capable of making omission errors when they fail to follow accurate advice and commission errors when they follow erroneous advice. The epistemic vice is a metaphor that explains the pressure that this dilemma will place upon HCPs, and it is highlighted as a significant practical ethical challenge by Grote and Berens:

Now, in the relevant philosophical debate, there are different theories about what would be reasonable for the clinician to do. According to the ‘Equal Weight View’, learning that an epistemic peer’s proposition differs from your own should diminish the confidence in one’s judgement. Hence, deferring to the algorithm is the most reasonable choice. By contrast, the ‘Steadfast View’ emphasises the epistemically privileged status of one’s own beliefs, which is why it is reasonable for the clinician to stick to her proposition. Therefore, we end up with a stalemate.⁸³

If the HCP may reasonably take either position, then this leaves them potentially exposed to the legal and moral consequences either way. An AI system could never be considered responsible when the recommendations are not followed, but HCPs may still be held accountable when erroneous AI advice is followed. The current CPS guidance states that ‘the deliberate overriding or ignoring of systems which are designed to be safe and have proven to be safe may be evidence of a serious breach of duty’⁸⁴ which unhelpfully turns the vice a little tighter. Whether an AI advisory system could fall into this category will require future clarification.

The new dimension of moral luck

A long-standing criticism of GNM is that moral luck determines who is (and who is not) culpable of manslaughter: death is the engaging threshold for the offence.⁸⁵ Two HCPs may make the same substantive error and both patients may become critically ill. One patient may survive and the other may not; the stakes are high, and the outcome is binary: prosecution or no prosecution. Once the critical error has occurred the HCP’s agency has ended and where it goes from there is largely a matter of luck.

In JC Smith’s famous example, a father leaves a colourless weedkiller in a bottle of lemonade, accessible to his young child.⁸⁶ If the child drinks the poison and dies then the father would be prosecuted; if the child does not drink the poison, then the father faces no action. His conduct is no less culpable, irrespective of the outcome. Smith demonstrates that the moral luck extends forward in a temporal dimension where punishment largely depends on factors outside the HCP’s control.⁸⁷ The introduction of AI systems creates a new dimension of moral luck extending backwards in a temporal dimension. Outcome luck has always permeated the criminal law,⁸⁸ but AI systems demand an analysis of input luck, which has largely been ignored in medical manslaughter cases to date.⁸⁹ The putative ‘reasonably competent doctor’ is an abstract construct that exists in a rational metaphysical space free from the vagaries of bad luck, exhaustion, and system pressures beyond their control.

AI development is often referred to as the ‘AI lifecycle’ to describe the many stages that are required to build machine learning systems including: the design of the product; the acquisition of the data; creating and evaluating the model; and then deployment.⁹⁰ There are many points of potential failure and safety risks exist in data collection, product development, as well as clinical use of AI.⁹¹ As demonstrated, AI can be brittle,⁹² but there are other significant safety risks such as: ‘concept drift’ where AI systems become less effective when they leave the training data behind; adversarial attacks that can easily mislead a model;⁹³ data poisoning where training data are compromised; or unsupervised machine learning systems that may realise objectives by taking dangerous actions that could not be anticipated by their creators.⁹⁴ Creating reliable, secure, and robust AI systems is a complex sociotechnical endeavour involving many actors and processes. This is a clear example of what the philosophical literature refers to as ‘the problem of many hands’.⁹⁵

While the nature of the errors may be highly technical and difficult to detect, the errors may materialise in ways that are socially untenable and likely to provoke anger and resentment when discovered. An example garnering much attention is that AI applications are particularly susceptible to bias.⁹⁶ Bias in training data may occur because the data sources themselves may not reflect the true epidemiology within a given demographic.⁹⁷ This means that errors may be far more likely to occur in under-represented groups such as ethnic minorities,⁹⁸ women,⁹⁹ and those with disabilities.¹⁰⁰

In the United States, an algorithm used to allocate healthcare resources had been widely discriminating against African Americans; subsequently, they were far less likely to be referred for treatment than white people when equally sick.¹⁰¹ The proprietary aspect of many algorithms may make this difficult to detect, which may cause further harm by leaving errors undiscovered. Another example is that skin cancer detection algorithms may be less effective on darker skin.¹⁰² This is an issue that could have severe consequences where machine learning is used in safety-critical scenarios.

If AI systems are not transparent and explainable, then they cannot be reliably detached from other aspects of the sociotechnological system, making it difficult to identify and correct errors. Therefore, structural inequalities in data are distilled into unsafe recommendations before they are validated by the HCP. Errors may also remain in the system because data are inaccurately labelled. There is currently no clear regulator for data quality which may leave HCPs facing the consequences of a poorly designed data set which produces dangerous and discriminatory outcomes.¹⁰³

Where these systemic failures exist, AI systems may be doomed to fail for particular patients at a particular time. There are already concerns that serious heart problems have been under-diagnosed when women use a primary care AI chatbot.¹⁰⁴ Those with a rare condition, uncommon co-morbidity, or those susceptible to a rare drug interaction may find the AI system makes an error that was fated to occur in the system. The asthma example will not be the last time an AI makes a dangerous highly counter-intuitive correlation. Mistakes like this may become silent killers within otherwise highly accurate systems and the HCP standing in front of the computer at this juncture may find themselves set up to fail, whereas the next clinician to use the system may find the computer working within the expected parameters on a paradigmatic case. Both clinicians may carefully follow the same processes, but one HCP may find that they are being investigated for a fatal error when they over-rely on the AI. These types of errors are impossible to detect; therefore, where and when they materialise will largely be a matter of luck for the clinician. Returning to Smith’s example, for clinicians using AI, the bottle of lemonade is already poisoned before it is opened. It is served up in good faith and if someone ingests it, they will point a finger towards the doctor that poured the drink.

GNM: the unexplainable law

The introduction of AI systems will present significant challenges to applying the established principles of the criminal law for fatal errors. However, the present GNM legal landscape is already far from clear: AI is not alone in having explicability problems.

At a doctrinal level, the fundamental rationale for prosecuting offences of GNM is set out by Lord Hewitt in R v Bateman that ‘the negligence or incompetence went beyond a mere matter of compensation and showed such disregard for the life and safety of others as to amount to a crime against the state and conduct deserving punishment.’¹⁰⁵

The current legal paradigm is derived from a patchwork of poor decisions and perplexing judgements that arguably demonstrate ‘an exhibition of the common law at its worst’.¹⁰⁶ The confusion and uncertainty arise from five aspects of the development of the law: the historical swing between recklessness (advertence) and gross negligence (inadvertence) as a basis for liability; the Court of Appeal test in Adomako;¹⁰⁷ the subsequent judgement in the House of Lords, which is still the leading authority;¹⁰⁸ the human rights challenge on certainty in Misra;¹⁰⁹ and, finally, the recent case law developments in Rudling¹¹⁰ and Rose.¹¹¹ As each judgement tried to make sense of the circular and nebulous nature of ‘gross negligence’, the confusion of each judgement was sewn into the next.

In Bateman the Court of Appeal decided that a doctor could be guilty of manslaughter by gross negligence if they showed ‘such disregard for life and safety of others as to deserve punishment’,¹¹² which suggested reckless intent as the bar to prosecution: to ‘disregard’ is to have an awareness of risk. This test was broadly reaffirmed in Andrews v DPP,¹¹³ Lamb,¹¹⁴ and Stone and Dobinson.¹¹⁵ In the 1980s’ HL and Privy Council cases of Seymour¹¹⁶ and Kong Cheuk Kwan,¹¹⁷ the courts moved to a test of objective recklessness as outlined in the criminal damage case of Caldwell.¹¹⁸ Ashworth noted that: ‘manslaughter by gross negligence had been replaced by and absorbed into reckless manslaughter’.¹¹⁹ At that stage the challenging task of trying to define the elusive nature of ‘gross’ appeared to have been abandoned altogether. This detour into objective recklessness was short-lived, confusing, and may have achieved little as it risked simply rebranding conduct that would have otherwise been grossly negligent as reckless. Objective recklessness is still theoretically a putative legal option for GNM¹²⁰ but could be uniquely unhelpful in the AI-enabled healthcare system where medical decision-making is epistemically more complex.

The foundation of the present legal test begins with the decision of Adomako in the Court of Appeal, which has only had minor adjustments since.¹²¹ Dr Adomako was a locum anaesthetist who failed to monitor the oxygen supply for over 4 minutes during routine surgery, leading to an unnecessary death, and was charged and convicted with GNM. Dr Adomako was profoundly negligent and oblivious to the risk. The case eventually reached the HL where Lord Mackay LC established the precedent that plotted the subsequent course for GNM stating that:

The ordinary principles of the law of negligence apply to ascertain whether or not the defendant has been in a breach of duty of care towards the victim who has died. If such a breach of duty is established, the next question is whether that breach of duty caused the death of the victim. If so, the jury must go on to consider whether that breach of duty should be categorised as gross negligence and therefore a crime.¹²²

The judgement is confusing and contradictory in many respects. Lord Mackay confirms that gross negligence rather than recklessness is the basis of liability but offers little further clarification stating that gross is simply a matter of degree and that ‘to specify that degree more closely is I think likely to achieve only a spurious precision’.¹²³ At times, the use of the terms ‘recklessness’ and ‘gross negligence’ have been used interchangeably in judgements, which has amplified the confusion. Lord Mackay rejects a subjective recklessness standard on one hand but states that it is ‘perfectly open to the trial judge to use the word “reckless” in its ordinary meaning as part of his exposition of the law’.¹²⁴ It is not clear what the ordinary meaning of reckless is, but Lord Mackay notes ‘a danger of over-elaboration of definition of the word “reckless”’.

The judgement was not generally well received, with criticisms that ‘prosecutors, experts, judges and juries are thus left to grapple with a difficult and circular concept’.¹²⁵ The juries are ultimately responsible for determining when conduct has crossed this line from civil to criminal liability. However, jurors are likely to over-rely on expert witnesses¹²⁶ which ‘underplays the risk of jury usurpation by investing too much epistemic authority in the expert’.¹²⁷ Therefore, understanding when the criminal law should be invoked is difficult because it is ill-defined, potentially broad in scope and difficult to consistently apply.

Judicial analysis has comprehensively failed to adequately capture the meaning of ‘gross’ without cycling through synonyms. Perhaps most famously, Leveson J, in both Sellu¹²⁸ and Rose¹²⁹ (and now adopted in Broughton¹³⁰), described gross as ‘truly exceptionally bad’. The problem with linguistic definitions like ‘truly exceptionally bad’ is that they mean little without reference to a juror’s own moral compass and they may fail to appreciate the complexities faced at the clinician–AI interface. After all, nobody assumes that they, personally, would tacitly follow sat nav instructions and drive into a river, yet it continues to happen.¹³¹

An opportunity to clear up the confusion presented itself in the Court of Appeal in another medical manslaughter case involving a misdiagnosis and untreated infection. R v Misra and Srivastava¹³² was appealed on the grounds that the lack of legal certainty in determining grossness amounted to a breach of Art 7 of the European Convention on Human Rights, which prohibits ex post facto criminalisation. It was argued this would inevitably lead to inconsistencies in prosecution policy. The court held that the issue ‘is not whether the defendant’s negligence was gross and whether, additionally, it was a crime, but whether his behaviour was grossly negligent and consequently criminal’.¹³³ As Ashworth commented, ‘it is a distinction without a difference.’¹³⁴ The judgement crystallised the confusion in Adomako and infused an untenable level of vagueness into the law. Quick argued that the dismissal is unconvincing.¹³⁵ Ashworth notes ‘it should not be the end of the matter’¹³⁶ and the introduction of AI systems may mean that this point needs to be revisited. As a result of the enduring state of confusion, a steady stream of cases continued to reach the Court of Appeal seeking clarification.

The law as it currently stands

The Adomako test was amended in the recent judgment of Rudling and Rose. Consequently, the legal test appeared to require a greater level of culpability than Adomako and is arguably less ambiguous in some respects.¹³⁷ However, analysis through the lens of the AI-induced errors reveals that any relief by clinicians may be misguided.

In the case of Rudling, two general practitioners (GPs) were initially charged with GNM following the death of 12-year-old Ryan Morse, who died from undiagnosed Addison’s disease.¹³⁸ Ryan had been ill for several months and his exact condition had remained undiagnosed.¹³⁹ After a deterioration in his health, his mother telephoned the surgery for advice and was instructed to bring Ryan in for a physical examination. Mrs Morse telephoned the surgery shortly afterwards to explain that Ryan was too ill to attend. At this point Dr Rudling refused to attend in person. This failure to appreciate the seriousness of the condition to the patient and visit in person (or call an ambulance) was considered grossly negligent by the prosecution and Dr Rudling alone stood trial for GNM. The prosecution was ultimately not successful because at the time of the telephone call, the risk of death was deemed not serious and obvious but that it might be serious. Therefore, the level of intent did not meet the criminal threshold. As a result, the case indicated that the risk known to the defendant had to be ‘serious and obvious’ and ultimately the prosecution expert was unable to assert that the information from the phone call alone that this was the case.

The logic of Rudling was adopted and extended in the case of Rose which was the fourth visit to the court of appeal in 5 years. In 2012 optometrist Honey Rose carried out an eye examination of 7-year-old Vincent Barker. Tragically, 5 months later Vincent died from hydrocephalus (swelling on the brain). Honey Rose was subsequently charged and convicted¹⁴⁰ with GNM following her failure to detect the obvious papilledema (swelling on the optic nerve). This failure of the statutory duty to examine the inner eye resulted in an avoidable death. As noted by the Court of Appeal,¹⁴¹ there was no doubt that any competent optometrist would have spotted the condition if they had examined the patient.

In quashing the conviction, the Court of Appeal tweaked the legal test in Adomako so that liability depended upon a ‘serious and obvious risk of death’. The court held that Honey Rose could not have appreciated the ‘serious and obvious risk of death’ because she had not carried the required physical examination at all. While an attempt at clarification, the judgement is criticised for undermining the objective test in Adomako and starting to shift back towards subjectivity.¹⁴² The analysis in Rose was endorsed in the non-medical case of Broughton¹⁴³ which now provides the most recent summary of the law.

The decision in Rose is controversial because the act of failing to properly assess risks may leave a practitioner in a state of negligent ignorance. Logically there can be no subsequent mens rea after an initial failure to assess the risk. As Mullock points out,

a suspect who has assessed risk and subsequently failed to react appropriately, despite being in a position to appreciate the risk, may be culpable while another, who has completely failed to assess risk, will have no case to answer, thus benefiting from their self-inflicted ignorance.¹⁴⁴

While this decision has putatively raised the threshold for prosecution and diverged from Adomako, any relief felt by clinicians may be misguided. The judgement in Rose does not raise the bar, but rather stratifies medical errors into different types of cases: only those where a state of negligent ignorance exists will benefit from this doctrinal get out of jail free card. Therefore, the putative shift towards subjectivity is limited in scope.

Where the door to criminal liability has closed for errors of negligent ignorance, the use of AI could reopen it and revert to the test in Adomako. This will occur because removing a state of negligent ignorance is precisely what AI systems are designed to do. For example, AI systems that are designed to analyse a patient’s symptoms and provide diagnostic advice, such as chatbots used for primary care triage, may require doctors to consider a broader range of possible diagnosis. Where HCPs are given that broader diagnostic advice, there is a scaffold for clinician decision-making and a permanent record on the system. Instead of the exculpatory state of negligent ignorance, HCPs will work within a medical panopticon with the range of foreseeable risks laid bare in system logs, to be pored over in hindsight with a complete informational framework.

To demonstrate the likely intersection of AI-assisted decision-making and the criminal law, the following section compares the factual backgrounds of recent GNM prosecutions with the hypothetical implications of using existing AI systems in similar circumstances. The overlap between the AI systems already in use and the type of clinical work recently captured by the gravity of the criminal law highlights the need to address this problem.

AI systems and manslaughter convictions

In 2016, Moorfields Eye Hospital NHS Foundation Trust entered a research partnership with DeepMind (now owned by GoogleHealth) to use AI to detect and diagnose serious eye conditions from the 5,000 optical coherence tomography (OCT) scans that are performed every week.¹⁴⁵ The system focussed on 53 key diagnoses relevant to NHS pathway referrals. The system was accurate 94% of the time,¹⁴⁶ and the performance in making recommendations ‘reaches or exceeds that of experts on a range of sight-threatening retinal diseases after training on only 14,884 scans’.¹⁴⁷

An AI system routinely performing OCT scans will be particularly noteworthy to healthcare lawyers familiar with the facts of Rose. Evidence suggests that AI systems can be trained to accurately detect papilledema and may soon be able to assist clinicians,¹⁴⁸ which will be legally significant because where AI systems make a diagnosis, any subsequent failure in care could more easily meet the threshold for gross negligence. AI recommendations are likely to become critical aspects of prosecution cases and could make more inadvertent errors appear serious and obvious where errors occur in the delivery of care.

AI systems within the sphere of primary care are likely to be commonplace. There are already putative AI systems making ‘diagnosis’ in the NHS. Babylon Health, formed in 2014, created the ‘GP at hand app’, to fulfil its mission to use technology to improve access to healthcare. It claims to have developed AI that can diagnose medical conditions and offers digital access to a GP via video chat on smart devices. The symptom checker is constantly available, and videoconferencing can be arranged at short notice. The GP is given a predictive diagnosis via AI prior to the conference. The Babylon CEO Ali Parsa claimed that in a short space of time the AI will be able to diagnose and plan treatment ‘better than any human doctor’;¹⁴⁹ a claim that has been met with some scepticism from the medical community¹⁵⁰ where evidence remains immature¹⁵¹ and there are concerns diagnosis may have been missed.¹⁵²

Referring back to the facts of Rudling, an AI system would have allowed Mrs Morse to input the symptoms through a series of questions through an app which uses a Bayesian network,¹⁵³ a form of machine learning, to calculate the most probable diagnosis and calculate the risk of other less likely conditions. If a potential diagnosis of Addison’s disease had been revealed by the system, then it may have had profound legal significance: it may have suggested that the risk of death was obvious.¹⁵⁴ If it had not, then it may also have proved exculpatory. While it is possible that AI may offer some protections against prosecution by demonstrating that negligence is not gross, it is hard to know where the line for ‘grossness’ will be drawn. Expert Witnesses will inform the jury about norms of medical practice, but it is unclear how they will assess reasonable courses of action when an AI system advises non-standard care.

AI has also shown credible potential to impact patient care in planning treatment. An example of an avenue of deployment is in the treatment of sepsis, which is the third leading cause of death worldwide, as well as the most common cause of mortality in hospitals. Sepsis treatment requires careful management of intravenous fluids and vasopressors and suboptimal decision-making leads to poorer outcomes. In research by Komorovski and colleagues, an AI system used a reinforcement learning agent to examine a large data set and the results showed that the treatment selected by the AI system was on average reliably better than human clinicians.¹⁵⁵ There is much hope that computational models like this can enhance clinical decision-making and improve patient outcomes in the future by reducing space for human error.

The medical manslaughter convictions of Misra,¹⁵⁶ Sellu,¹⁵⁷ and Bawa-Garba¹⁵⁸ have all involved failures in treatment planning and responding to infections, which often leads to preventable hospital mortality. The case of Dr Sellu involved the death of James Hughes, a 66-year-old man, who died of faecal peritonitis several days after having knee surgery in the Clementine Churchill Hospital, which is a private institution. Dr Sellu became involved in the care of the patient when post-operative monitoring first revealed signs of infection. The private hospital was deemed to have systemic failings in relation to monitoring patients and arranging emergency surgery, but Dr Sellu made a number of serious individual failings: he failed to ensure that antibiotics were administered; he failed to treat necessary tests as urgent; and when it became apparent that the patient’s condition was deteriorating, Dr Sellu failed to act with sufficient urgency. His failures were assessed by the prosecution witness as being ‘bizarrely slow and laidback . . . which if proposed by a candidate for a basic doctor’s examination would result in a fail’.¹⁵⁹ The failure of Dr Sellu to arrange the operation with the required urgency ultimately meant that the clock ran down on the opportunity to save the patient. An accurate AI system planning the treatment should have saved the patient and helped the clinician avert the error. However, where there are broader systemic issues, AI systems still may not avert an error. The case of Bawa Garba involved a junior doctor responsible for the admission of Jack Adcock, a 6-year-old boy with Down syndrome. A brief initial assessment did not accurately detect the severity of his symptoms. A cascade of systemic failures compounded the initial error: Dr Bawa-Garba had worked a double shift; results were delayed by an error with a computer system; there was a delay in prescribing antibiotics; and communication had been poor in a department heavily reliant upon agency nurses. Despite the litany of contributory factors, Dr Bawa-Garba was convicted of GNM.

Another similar avoidable death occurred in the conviction of Drs Misra and Srivastava who failed to properly diagnose an infection that led to fatal toxic shock.¹⁶⁰ They were convicted of GNM and their subsequent appeal failed. The facts laid out in the judgement show something quite revealing: the condition was rare and ‘given the rarity may not amount to negligence at all’.¹⁶¹ There is no doubt that either doctor responded to the patient’s symptoms; however, the misdiagnosis was fatal, and the patient continued to deteriorate and died under their care.¹⁶² It remains to be seen whether rare cases may still slip through the net with AI systems: it is very difficult to train and test AI systems for rare events because AI systems require large relevant data sets to learn.

Where AI systems are accurate, it still presents a double-edged sword to clinicians: it may avert many of the errors outlined above, but where subsequent systemic failures result in inadequate care, the likelihood of successful prosecutions against individual clinicians could increase. This will then create a potent incentive for clinicians to strictly adhere to AI recommendations in delivering care. However, where the AI systems are not accurate and deliver erroneous recommendations, this creates a profound challenge for HCPs. AI errors are likely to be highly counter-intuitive and unpredictable, it is unreasonable to expect they can always be picked up in the current medical practice paradigm, which creates the risk of AI-induced criminal liability.

Responsibility of the HCP

This article has demonstrated that there are many circumstances where it is predictable that even a careful, diligent, and reflective HCP may act on erroneous advice. The system-level objectives will demand that AI systems will be used by HCPs who are not at the top of the HCP hierarchy and where the HCPs will be confident that the AI system performs (on average) better than they do at the particular classification task at hand. In such contexts it would be perverse not to accept that HCPs are likely to over-rely on the AI advice, and that they should not be criminally liable in cases where they do over-rely.

It may be tempting to dismiss the likelihood of prosecutions but there is a history of humans in the loop being held responsible for errors involving automated and assisted decision-making. Elish argues that humans in complex systems act like a ‘moral crumple zone, like a car bonnet designed to absorb the force of impact in a crash’ and suffering the ‘moral and legal penalties when the system fails’.¹⁶³ A relevant example of what may lie ahead for HCPs involves the prosecution of Rafaela Vasquez in the United States for a fatal accident involving a ‘self-driving car’.¹⁶⁴ The autonomous vehicle, owned by Uber, failed to automatically stop when cyclist Elaine Herzberg was crossing the road and a collision occurred at 39 mph proving to be fatal. The accident occurred at the peak of the hype around autonomous vehicles and created much speculation around potential liability.¹⁶⁵ The ‘backup driver’ Rafaela Vasquez was blamed and is alleged to have been distracted by her mobile phone, but the National Transportation Safety Board also found that multiple complex factors contributed to the accident including, most notably, Uber’s inability to address ‘automation complacency’.¹⁶⁶ Rafaela Vasquez is awaiting trial, while Uber has not been prosecuted for any of the failures that contributed to the crash.¹⁶⁷ This single individual prosecution that seeks to parse individual responsibility from complex interconnected failings bears much similarity to medical manslaughter prosecutions.

There is also likely to be some theoretical weight behind the idea that doctors should be responsible for AI decisions. Hart supported the classification of negligent conduct as criminal provided the individual was of normal capacity.¹⁶⁸ There are contemporary academics who have been largely supportive of the concept of negligent liability in criminal law.¹⁶⁹ The basis for this vision of moral responsibility is that where the HCP has a duty and capacity to avoid harm, then they are culpable when that harm occurs. The potential risk for HCPs is that they will always retain the technical capacity to avert the harm because they could theoretically ignore erroneous AI advice and will have a legal duty to do so where it is obvious to them. The HCP will be judged objectively on what amounts to ‘truly exceptionally bad’¹⁷⁰ and this might not take account of the effects of AB or the epistemic vice. If the jury believe that the reasonably competent doctor should not have allowed the error to happen, then the HCP may be convicted.

There are principled objections to gross negligence because of the lack of any intent to do harm: there simply is not enough mens rea.¹⁷¹ Manslaughter has moved away from the foundational principles of the criminal law and the notion of blameworthy conduct. It amounts to ‘a very crude form of retribution, based not on the moral culpability, but on the harm caused’¹⁷² amounting to an ‘unprincipled privileging of harm over culpability’.¹⁷³ The appropriate response to medical error is an ethical question as well as a legal question and it has been noted that manslaughter has disconnected from ethics in the way that it depends on moral luck,¹⁷⁴ and that it involves a low level of mens rea for such a serious offence. One way for the law to reconnect with ethics is to consider the ‘control’ and the ‘epistemic’ conditions for responsibility¹⁷⁵ and ask in what sense the HCP has sufficient knowledge of the ‘black box’ system to genuinely avert the error, and what level of control they have under AB and operating in an epistemic vice. Since human professionals will retain legal responsibility for validating the AI system, the AI is a novel form of agency but does not meet the standards required for legal agency. Therefore, the human HCP becomes the ersatz legal and moral agent for AI decisions. Since human HCPs have this instrumentality imposed upon them, the law should reflect the reality of this position. AI-assisted decisions are the result of long chains of sociotechnical complexity, which fundamentally challenges the traditional ‘snapshot’ approach to responsibility attribution.

In Adomako, the court rejected the argument of the defence that involuntary manslaughter should have the characteristics of (1) clarity, (2) certainty, (3) intellectual coherence, and (4) general applicability and that the test should be subjective recklessness.¹⁷⁶ The submission was rejected but the argument should be rebooted because a test of subjective recklessness presents a much more convincing approach to assess culpable conduct with errors that involve erroneous AI recommendations.

The concept of subjective recklessness is well developed in judicial commentary. Following the legal analysis as set out in the criminal damage case of Cunningham,¹⁷⁷ the test for subjective recklessness is stated in Re G by Lord Bingham: states that a person acts recklessly when: (1) a circumstance in which he is aware that the risk exists or will exist; (2) a result that when he is aware of the risk it will occur; and (3) it is in the circumstances known to him, unreasonable to take the risk.¹⁷⁸ The test of recklessness is ‘uncontentious and well-understood’¹⁷⁹ and could bring much anticipated relief to HCPs and much-needed clarity to a problematic area of law: rarely do such complex problems have such an obvious solution. The argument that subjective recklessness should replace gross negligence is not revolutionary, or indeed something that would set involuntary manslaughter apart from other criminal offences: quite the opposite is true. Repeated judgements have held that subjective recklessness should apply to non-fatal offences, such as Spraggan,¹⁸⁰ Morgan,¹⁸¹ and Parmenter.¹⁸² It is fair to say that the wider criminal law is shifting towards subjectivism. The reason that the courts are presently reticent to fully embrace subjective recklessness for involuntary manslaughter involves a legacy of unresolved legal and philosophical tensions over whether negligence should ever reach threshold of criminal conduct. As noted by Gardener and Jung: ‘The dispute between “subjectivists” and “objectivists”, meanwhile, is largely about a single issue – the propriety of imposing criminal liability for negligence.’¹⁸³ It has previously been suggested that a specific offence could be created for manslaughter in healthcare settings to take account of the unique clinical context,¹⁸⁴ which could go some way to addressing challenges posed by AI advisory systems; however, with a test of subjective recklessness this requirement would dissipate.

A potential objection to adopting a legal standard of subjective recklessness is that there may be practical difficulties in establishing subjective fault; however, this may be over-stated because the jury remains entitled to find the defendants account unconvincing. Quick argues that ‘any such worry that prosecutors couldn’t prove subjective awareness of risk is an exaggerated one.’¹⁸⁵ With AI systems, these objections may be diminished by the nature of AI systems laying bare the frame for the subjective epistemic condition. HCPs will be given AI recommendations and when they are followed in good faith, it should not meet the threshold of criminality.

Another popular objection to a subjective recklessness standard may be that allowing the most catastrophic examples of AI-induced errors to go unpunished could lead to the removal of a potent deterrent. Glanville Williams accepted this utilitarian justification for negligent liability, reasoning that the threat of sanction would encourage improved standards of behaviour.¹⁸⁶ However, this utilitarian justification for criminal negligence is inextricably linked to the efficacy of its deterrent effect: it must stand or fall on whether it works. AI-induced errors likely to be complex, unexplainable, and involve a psychological dimension. They are not the kind of errors that can be deterred, and it will do nothing to improve safety. Instead, punishing inadvertent AI-induced errors may contribute to a climate of fear and undermine the policy initiatives to create a fairer response to medical error. It is not controversial to state that prosecutions do not help promote a culture of candour and reporting errors; as Merry and McCall Smith argued: ‘blaming the person “holding the smoking gun” may simply leave the scene set for a re-occurrence of the same tragedy.’¹⁸⁷ Prosecuting doctors for AI-induced GNM is likely to create three undesirable social consequences: it is likely to be inimical to safety; it could undermine system goals by incentivising defensive practice when using AI; and it may damage trust and fatally undermine the adoption of beneficial technologies. Where the criminal law is likely to have such negative social consequences, there is a strong argument for taking a minimal approach. As Husack argues, it is wrong to create an offence or set of offences where this might cause greater social harm than leaving the conduct outside the criminal law.¹⁸⁸

Conclusion

The potential sources of AI-induced error are manifold and complex. The nebulous nature of ‘gross negligence’ is impossible to define and it will be profoundly difficult to apply to AI-induced errors. A legal test of subjective recklessness could reduce the risk of an AI-induced error falling within the ambit of the criminal law and, on balance, presents a better option. If the policy aims for the AI healthcare system are realised, then an increasing proportion of medical decisions may soon involve AI systems. Therefore, the inevitability of AI-induced errors in healthcare could have significant implications for the mens rea of culpable homicide more generally. In the absence of clear subjective recklessness, the criminal law should not be invoked; it is likely to damage trust and in extreme circumstances may lead to rejection of beneficial AI solutions.

The AI ethical literature highlights maintaining societal trust as fundamental to realising the potential benefits of AI technologies.¹⁸⁹ The problem with the criminal law is that it can erode trust in two ways: when it is applied unjustly; and when not used at all and creates a perception that nobody has been held responsible. Therefore, adopting a more minimal application of the criminal law¹⁹⁰ alone as a solution to these types of errors will remain unconvincing without having a just response to AI-induced medical error. AI systems make a strong case for moving towards more patient-centric responses to errors. The patiency perspective involves two critical aspects that require different legal and ethical frameworks: first, an ex ante need to know how the technology will be used and that it will be safe and well regulated; second, an ex post system that can ensure that, where errors inevitably occur, the appropriate response takes place so that victims can understand what happened and that responsibility is taken for the harm.

The aim of this conclusion is not to introduce new concepts at a late stage, but to highlight that there are extant arguments for more patient-centric approaches to adverse medical events, which should be given renewed examination through the lens of AI-induced errors. For example, no-fault systems may be necessary to address AI errors¹⁹¹ and non-contentious legal responses could help avoid criminal complaints where the police station is the option of last resort. Engaging with what a victim needs is the appropriate response from a relational framework asking: Who has been affected? and What are their needs? It is also correct to recognise that the human HCPs are also patients of AI systems. Incorporating the policy aims of creating fairer systems in healthcare will require a shift to asking, ‘what is responsible, not who is responsible’.¹⁹² As the healthcare system starts to incorporate AI systems, it presents an opportunity to bring a more human approach to the aftermath of fatal errors. There are already arguments that restorative justice is an appropriate response to many fatal medical errors¹⁹³ and the possibility of AI-induced liability highlights the ongoing importance in exploring alternative responses outside of the criminal law and theorising new models of liability. There is no expectation that the pain caused by the death of a loved one is a situation that can be restored; instead, the challenge for healthcare systems is to develop an appropriate and just legal response to manage the aftermath of AI-induced errors and proceed into our AI healthcare future with the trust of patients and the medical profession.

Footnotes

Acknowledgements

I would like to thank Dr Sarah Devaney, Prof Soren Holm, Dr Alex Mullock, and Ms Claire Beck for comments on an earlier draft. I would also like to the thank the anonymous reviewers.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Benjamin Bartlett

1.

In November 2017, ‘NHS Chef Executive Simon Stevens Gave an Extraordinary Public Warning About the Chronic Underfunding of the NHS’, The Guardian, 8 November 2017, available at (accessed 1 June 2022).

2.

C Dyer, ‘Bawa Garba Case Has Left the Profession Shaken and Stirred’, BMJ 360 (2018), p. k456.

3.

See for example Berwick Review executive summary aim to ‘embrace transparency unequivocally and everywhere’, available at (accessed 1 May 2023).

4.

There has been much criticism from the medical profession and a crowdfunding campaign raised £200,000 to challenge the ruling. A total of 8,000 doctors signed a letter in opposition to the ruling citing the damaging effect on openness and patient safety.

5.

R v Adamako [1995] 1 AC 171.

6.

[1995] 1 AC 171.

7.

[1995] 1 AC 171 (Lord Mackay LC) [187].

8.

[2016] EWCA Crim 1716.

9.

[2017] EWCA Crim 1168.

10.

[2020] EWCA Crim 1093.

11.

EWCA Crim 1716 [152].

12.

D. Griffiths and A. Sanders, ‘The Road to the Dock: Prosecution Decision-Making in Medical Manslaughter Cases’ in D. Griffiths and A. Sanders, eds. Bioethics, Medicine and the Criminal Law (Cambridge: Cambridge University Press, 2013), pp. 117–148.

13.

R. Ferner and S. McDowell, ‘Doctors Charged With Manslaughter in the Course of Medical Practice, 1795–2005: A Literature Review’, Journal of Social Media 99 (2009), p. 309.

14.

There Is Now a Minimum Term of 2–4 Years for Conviction, (accessed 1 May 2023).

15.

See GMC appeal of decision not to strike off Dr Bawa-Garba, available at (accessed 1 May 2023).

16.

For example, following the conviction of Honey Rose in 2016, Detective Superintendent Antonis from Suffolk Police made the statement: ‘If this case makes the optometry profession reflect on their practices and review their policies to prevent it happening to anyone again, or encourages other parents to take their children to get their eyes tested with the knowledge that any serious issues would be picked up, then it will be worthwhile’, available at (accessed 1 May 2023).

17.

O. Quick, ‘Medicine Mistakes and Manslaughter: A Criminal Combination?’, Cambridge Law Journal 69 (2010), p. 450.

18.

‘The Berwick Review’, available at (accessed 1 June 2022).

19.

The Williams Review was an independent review of gross negligence manslaughter in healthcare in response to the high-profile case of Dr Bawa-Garba. The review was led by Professor Norman Williams in 2018 which as part of its remit considered concerns by the medical profession that simple errors could result in prosecutions, even in a broader context of systemic failings.

The Hamilton Review was commissioned in 2018 by the General Medical Council (GMC) in the aftermath of the Dr Bawa-Garba case. Dr Leslie Hamilton found that trust had been damaged between the medical profession and the regulator, and he set out several recommendations to build trust which were accepted by the GMC. Both the Williams Review and the Hamilton Review were significant in terms of their potential impact on the healthcare system, and their recommendations have been closely scrutinised by health professionals, policymakers, and the public, available at (accessed 1 May 2023).

20.

The key judgements in the development of the law R v Bateman [1925] 94 LJKB 791, R v Adomako [1995] 1 AC 171, R v Misra and Srivastava [2005] 1 Cr App R 21, R v Rudling [2016] Crim 741, and R v Rose [2017] EWCA Crim 1716 all consider gross negligence manslaughter in a medical context.

21.

R v Zaman [2017] EWCA Crim 1783.

22.

See Quick, ‘Medicine Mistakes and Manslaughter: A Criminal Combination?’; A. Lodge, ‘Gross Negligence Manslaughter on the Cusp: The Unprincipled Privileging of Harm Over Culpability’, Journal of Criminal Law 81(2), 2017, pp. 125–142.

23.

The term was first used in 1956 by John McCarthy at a Dartmouth College Academic Conference.

24.

See D. Crevier, AI: The Tumultuous Search for Artificial Intelligence (New York: Basic Books, 1993); L. Floridi, ‘AI and Its New Winter: From Myths to Realities’, Philosophy and Technology 33 (2020), pp. 1–3.

25.

I. Hasham, Ibrar Yaqoob, Nor Badrul Anuar, Salimah Mokhtar, Abdullah Gani, and Samee Ullah Khan, ‘The Rise of Big Data on Cloud Computing: Review and Open Research Issues’, Information Systems 47 (2015), pp. 98–115.

26.

See EU definition: ‘Artificial Intelligence refers to systems that display intelligent behaviour by analysing their environment and taking actions-with some degree of autonomy to achieve specific goals. AI-based systems can be purely software based, acting in the virtual world (voice assistants, image analysis software, search engines, speech and face recognition systems) or AI can be embedded in hardware devices eg advanced robots, autonomous cars, drones or internet of things applications’. Communications from the Commission to the European Parliament, The European Council on Artificial Intelligence for Europe 25.4.2018 237.

27.

Available at (accessed 1 May 2023).

28.

Available at (accessed 1 May 2023).

29.

L. Floridi, The Method of Levels of Abstraction (Rochester, NY: Social Science Research Network, 2008).

30.

I. Bartoletti, ‘AI in Healthcare: Ethical and Privacy Challenges’ in D. Riaño, S. Wilk, and A. ten Teije, eds., Artificial Intelligence in Medicine. AIME 2019. Lecture Notes in Computer Science, Vol. 11526 (Cham: Springer), pp. 7–10.

31.

Available at (accessed 1 June 2022).

32.

Available at (accessed 1 June 2022).

33.

E. Topol, ‘The Topol Review: Projected Impact on NHS Workforce’, fig 1 at 27. Available at https://topol.hee.nhs.uk/the-topol-review/#:~:text=About%20the%20Topol%20Review&text=The%20Topol%20Review%2C%20led%20by,to%20deliver%20the%20digital%20future.

34.

S. Chilamkurthy, R. Ghosh, S. Tanamala, M. Biviji, N. G. Campeau, V. K. Venugopal, V. Mahajan, P. Rao, and P. Warier, ‘Deep Learning Algorithms for Detection of Critical Findings in Head CT Scans: A Retrospective Study’, The Lancet 392(10162), 2018, 2388–2396.

35.

N. Houssami, G. Kirkpatrick-Jones, N. Noguchi, and C. I. Lee, ‘Artificial Intelligence (AI) for the Early Detection of Breast Cancer: A Scoping Review to Assess AI’s Potential in Breast Screening Practice’, Expert Review of Medical Devices 16(5) (2019), pp. 351–362.

36.

N. Gupta, Deepak Gupta, Ashish Khanna, Pedro P. Rebouças Filho, and Victor Hugo C. de Albuquerque, ‘Evolutionary Algorithms for Automatic Lung Disease Detection’, Measurement 140 (2019), pp. 590–608.

37.

T. J. Brinker, A. Hekler, A. H. Enk, C. Berking, S. Haferkamp, A. Hauschild, M. Weichenthal, J. Klode, D. Schadendorf, T. Holland-Letz, C. von Kalle, S. Fröhling, B. Schilling, and J. S. Utikal, ‘Deep Neural Networks Are Superior to Dermatologists in Melanoma Image Classification’, European Journal of Cancer 119 (2019), pp. 11–17.

38.

M. Havai, Axel Davy, David Warde-Farley, Antoine Biard, Aaron Courville, Yoshua Bengio, Chris Pal, Pierre-Marc Jodoin, and Hugo Larochelle, ‘Brain Tumour Segmentation With Deep Neural Networks’, Medical Image Analysis 35 (2017), pp. 18–31.

39.

Y. Sim, Myung Jin Chung, Elmar Kotter, Sehyo Yune, Myeongchan Kim, Synho Do, Kyunghwa Han, Hanmyoung Kim, Seungwook Yang, Dong-Jae Lee, and Byoung Wook Choi, ‘Deep Convolutional Neural Network–Based Software Improves Radiologist Detection of Malignant Lung Nodules on Chest Radiographs’, Radiology 294 (2019), p. 182465.

40.

F. Pasquale, New Laws of Robotics: Defending Human Expertise in the Age of AI (Cambridge, MA: Belknap Press of Harvard University Press, 2020).

41.

P. Case, ‘The Jaded Cliché of Defensive Medical Practice, From Magically Convincing to Empirically Unconvincing’ Journal of Professional Negligence 36(2) (2020), pp. 49–77.

42.

See RCGP working group on over-diagnosis, available at .

43.

See, for example, supreme court judgement in Montgomery v Lanarkshire [2015] UKSC 11 92 where the joint speech of Lord Kerr and Lord Reid rejected defensive medical practice argument and argued that raising the standard of care may ‘be less likely to encourage recriminations and litigation’; see also Lady Hale in Michael v Chief Constable South Yorkshire [2015] UKSC 2 198.

44.

It is important to note that, for the foreseeable future, the AI systems will be introduced iteratively and will perform very narrow tasks. There will not be a single AI system that takes on the work of a doctor across various fields: rather the system may analyse a particular scan for a specific investigation with a high degree of accuracy, having been trained on more examples than a human HCP could see in a lifetime.

45.

A. Froomkin, Ian R. Kerr, and Joelle Pineau, When AIs Outperform Doctors: Confronting the Challenges of a Tort-Induced Over-Reliance on Machine Learning (Rochester, NY: Social Science Research Network, 2019).

46.

The NIHR Innovation Observatory Horizon scanning exercise for NHSX shows that there are now 132 AI products that have been developed, covering 70 different conditions.

47.

While there are other forms of AI systems in development that interact with patients directly, including home diagnostic kits and monitoring devices, they are not considered in this article; Artificially Intelligent Advisory Systems are expected to be introduced in the near term and it is the AI and HCP interface that presents the immediate legal implications for criminal liability for fatal errors. Therefore, since advisory systems are discussed, this article only considers the criminal liability of the individual clinician validating AI advice.

48.

In some cases, AI systems could be fully automated without human intervention. Closed loop systems are designed to learn and adapt to continual feedback. For example, in automated vehicles, myriad sensors analyse the environment and feed the data back into the control of the system to adjust speed, direction, and other parameters.

49.

see S22 GDPR a person has a right to have a decision made ‘not to be solely subject to a decision based on automatic processing’ (Regulation EU 2016/679); according to the Data Protection Working Party this provision applies to ‘decisions that affect someone’s access to health services’ and ‘it should be carried out by someone who has the authority and competence to change the decision’. See article 29 Data Protection Working Party 2018 Guidelines on Automated Decision Making and Profiling for the purposes of Regulation 2016/679. The extent to which the UK may diverge post Brexit is presently unclear.

50.

L. Samuel, ‘Some Studies in Machine Learning Using the Game of Checkers’, IBM Journal of Research and Development 3 (1959), pp. 210–229.

51.

K. Goddard, A. Roudsari, and J. C. Wyatt, ‘Automation Bias: A Systematic Review of Frequency, Effect Mediators, and Mitigators’, Journal of Medical Informatics Association 19(1) (2012), pp. 121–127.

52.

S. McDowell, Harriet S. Ferner, and Robin E. Ferner, ‘The Pathophysiology of Medication Errors: How and Where They Arise’, BJCP 67 (6) (2009), pp. 605–613.

53.

[2016] EWCA Crim 1716. The description by Leveson J of ‘gross’ as ‘truly exceptionally bad’ is an important part of the legal test in recent judgements and forms the conceptual foundation for the jury in determining the threshold of criminal culpability.

54.

It is important to note that reducing costs is not the only system-level objective of healthcare generally. In a publicly funded health system ensuring universality of care and providing safe and effective treatment are also system-level objectives. However, in the UK improving efficiency and controlling costs are consistently set out as a core policy objective of AI technologies in health.

55.

A. Campalo, Madelyn Sanfilippo, Meredith Whittaker, and Kate Crawford, ‘AI Now Report 2017’, available at ; S. Robbins, ‘A Misdirected Principle With a Catch: Explicability for AI’, Minds and Machines 29 (2019), 495–514; L. Edwards and M. Veale, ‘Slave to the Algorithm? Why a “Right to an Explanation” Is Probably Not the Remedy You Are Looking For’ (Rochester, NY: Social Science Research Network, 2017).

56.

The Berwick Review, 2013.

57.

C. Rudin, ‘Stop Using Black Box Machine Learning for High Stakes Decisions and Use Interpretable Models Instead’, Nature Machine Intelligence 1 (2019), 206–215.

58.

R. Tomsett, Dave Braines, Dan Harborne, Alun Preece, and Supriyo Chakraborty, ‘Interpretable to Whom? A Role Based Model for Analysing Interpretable Machine Learning Systems, 2018, available at (accessed 1 May 2023).

59.

Z. Lipton, ‘The Mythos of Model Interpretability’, Communications of the ACM 61(10) (2016), pp. 36–43.

60.

For example, restricting systems to linear regression or decision trees.

61.

R. Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi, ‘A Survey of Methods for Explaining Black Box Models’, ACM Computing Surveys 51(5) (2018), pp. 1–42.

62.

S. Watson, J. Krutzinna, I. N. Bruce, C. E. Griffiths, I. B. McInnes, M. R. Barnes, and L. Floridi, ‘Clinical Applications of Machine Learning Algorithms: Beyond the Black Box’, BMJ 364 (2019), p. l886.

63.

See A. Saporta, Xiaotong Gui, Ashwin Agrawal, Anuj Pareek, Steven Q. H. Truong, Chanh D. T. Nguyen, Van-Doan Ngo, Jayne Seekins, Francis G. Blankenberg, Andrew Y. Ng, Matthew P. Lungren, and Pranav Rajpurkar, ‘Deep Learning Saliency Maps Do Not Accurately Highlight Diagnostically Relevant Regions for Medical Image Interpretation’, 2021, available at (accessed 1 May 2023).

64.

M. Ghassemi, Luke Oakden-Rayner, and Andrew L. Beam, ‘The False Hope of Current Approaches to Explainable Artificial Intelligence in Healthcare’, Lancet Digital Health 3 (2021), e745–e750.

65.

A. Merry and A. McCall Smith, Errors Medicine and the Law (Cambridge University Press, 2001), p. 13.

66.

P. Gooderham and B. Toft, ‘Involuntary Automaticity and Medical Manslaughter’, Bioethics, Medicine and the Criminal Law 2 (2013), p. 180.

67.

Op. cit., p. 178.

68.

What exactly amounts to the reasonably skilled doctor is far from clear and largely left to the jury. In R v Adomako [1995] 1 AC 171, Lord Mackay accepted the summing up of the trial judge which ‘cannot be faulted’ restating the reasonable doctor explanation as: ‘They are bound to show a reasonable amount of skill according to the circumstances of the case, and you have to judge them on the basis they are skilled men, but not necessarily so skilled as more skilful men in the profession, and you can only convict them criminally if, in your judgement, they fall below the standard of skill which is the least qualification any doctor should have’ (Lord Mackay) [188]; see also for the crown ‘gross negligence is a simple concept for the man in the street to understand’ (Ann Curnow QC) [180].

69.

AG ref (no2 99).

70.

Lui and colleagues found that only 4 out of 82 studies examined in a systematic review allowed clinicians access to additional information that they would have in clinical practice. X. Lui, L. Faes, A. U. Kale, S. K. Wagner, D. J. Fu, A. Bruynseels, T. Mahendiran, G. Moraes, M. Shamdas, C. Kern, J. R. Ledsam, M. K. Schmid, K. Balaskas, E. J. Topol, L. M. Bachmann, P. A. Keane, and A. K. Denniston, ‘A Comparison of Deep Learning Performance Against Health Care Professionals in Detecting Diseases From Medical Imaging: A Systematic Review and Meta-Analysis’, Lancet Digital Health 1 (2019), pp. e271–e297.

71.

S. Holm, Catherine Stanton, and Benjamin Bartlett, ‘A New Argument for No-Fault Compensation in Healthcare: The Introduction of Artificial Intelligence Systems’, Journal Healthcare Analysis 29 (2021), pp. 171–188.

72.

See for example the case of R v Sellu [2016] EWCA Crim 1716 where less senior clinicians were not indicted.

73.

There are numerous examples of the fragile nature of AI: examples including autonomous vehicles being easily fooled into breaking speed limits by tape; P. O’Neil, ‘Hackers Can Trick a Tesla Into Accelerating by 50 mph’ MIT Technology Review; or an AI Automatic Camera System Repeatedly Confusing an Assistant Referee’s Bald Head With the Football During Sport Coverage’, 2020, available at (accessed 1 May 2023). For more on the AI common sense problem see: E. Davis and G. Marcus, ‘Commonsense Reasoning and Commonsense Knowledge in Artificial Intelligence’, Communications of the ACM 58 (2015), pp. 92–103; M. Broussard, Artificial Unintelligence (Cambridge, MA: MIT Press, 2019).

74.

M. Ribeiro, Sameer Singh, and Carlos Guestrin, ‘Why Should I Trust You? Explaining the Predictions of Any Classifier?’ 2016, available at (accessed 1 May 2023).

75.

For example, an algorithm may learn to infer that when clinicians are more concerned patients are sent straight to a specialist centre which use a higher quality scan. For more about spurious correlations and healthcare, see R. Challen, Joshua Denny, Martin Pitt, Luke Gompels, Tom Edwards, and Krasimira Tsaneva-Atanasova, ‘Artificial Intelligence Bias and Clinical Safety’, BMJ Quality & Safety 28 (2019), pp. 231–237.

76.

R. Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad, ‘Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-Day Readmission’, KDD ’15 ACM, (Sydney, August 2015).

77.

G. Marcus and E. Davis, Rebooting AI: Building Artificial Intelligence We Can Trust (New York: Penguin Random House, 2020), p. 13.

78.

[2017] EWCA Crim 1168.

79.

The duty required is to ensure that the HCP thinks carefully about advice in the way that any HCP would in similar circumstances. This duty to ‘act in accordance with a practice accepted as proper by a responsible body of medical men skilled in that particular art’ is well established through Bolam v Friern Hospital Management Committee [1957] 1 WLR 583 and Bolitho v City and Hackney HA [1996] 4 ALL ER 771 which establish that the court is the ultimate arbiter of responsible practice.

80.

Pasquale, New Laws of Robotics.

81.

See for example, the famous ‘move 37’ of DeepMind Alpha Go where the team could not explain the move and anticipated that the AI had made an error, only to find that the highly counter-intuitive move proved a winning strategy, available at (accessed 1 May 2023).

82.

Other examples exist such as the crash of Air France 447 on 1 June 2009. The aircraft angle of attack exceeded the parameters of the stall warning, so it incorrectly warned of a stall only when the pilots began to correct the aircraft. The incorrect stall warning confused the pilots and was a critical factor in the crash. Final Report, available at (accessed 1 May 2023).

83.

T. Grote and P. Berens, ‘On the Ethics of Algorithmic Decision-Making in Healthcare’, Journal of Medical Ethics 46(3) (2020), pp. 205–211.

84.

Available at (accessed 1 May 2023).

85.

J. C. Smith, ‘The Element of Chance in Criminal Liability’, Criminal Law Review 63 (1971), pp. 63–71.

86.

Op. cit., p. 66.

87.

The extent to which developers or other actors within the AI lifecycle could be held criminally responsible for AI errors lies outside the scope of this article. In England and Wales, both at individual and organisational levels of liability, the law has arguably been insufficient to address serious systemic failings involving various levels of decision-making in healthcare. Therefore, under the current paradigm, there is likely to be a negligible risk of criminal liability to anyone other than a frontline clinician. It is worth noting that a test of subjective recklessness will not necessarily impact the extent to which other actors within the AI lifecycle or healthcare system decision-makers would be at risk of criminal liability from failing AI technologies. For more discussion on systemic and organisational failure and the criminal law in healthcare see: M. Kazarian, ‘Who Should Be Responsible for Healthcare Failings?’, Medical Law Review 27(3) (2019), 390–405; M. Kazarian, Criminalising Medical Malpractice: A Comparative Perspective (Abingdon: Routledge, 2021).

88.

R. Duff, ‘Whose Luck Is It Anyway?’ in C. M. V. Clarkson and S. R. Cunningham, eds., Criminal Liability for Non-Aggressive Death (Aldershot: Ashgate, 2008), pp. 61–78.

89.

See for example the error to misread the DNR in R v Bawa Garba as ‘an expression of how tired she was’.

90.

See ‘Cross Industry Standard Process for Data Mining’ (CRISP-DM) or Microsoft Team Data Science (TDSM) for methods describing AI lifecycle.

91.

E. Vayena, Alessandro Blasimme, and I. Glenn Cohen, ‘Machine Learning in Medicine: Addressing Ethical Challenges’, PLoS Medicine 15(11) (2018), p. e1002689.

92.

Marcus and Davis, ‘Rebooting AI’.

93.

K. Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song, ‘Robust Physical-World Attacks on Deep Learning Models’, ArXiv:1707.08945 [Cs], 2018, DOI: 10.48550/arXiv.1707.08945.

94.

D. Leslie, Understanding Artificial Intelligence Ethics and Safety: A Guide for the Responsible Design and Implementation of AI Systems in the Public Sector (London: The Alan Turing Institute, 2019).

95.

H. Nissenbaum, ‘Accountability in a Computerised Society’, Science and Engineering Ethics 2(1) (1996), pp. 25–42; D. Gunkel, ‘Mind the Gap: Responsible Robotics and the Problem of Responsibility’, Ethics and Information Technology 22 (2020), pp. 307–320.

96.

D. Schönberger, ‘Artificial Intelligence in Healthcare: A Critical Analysis of the Legal and Ethical Implications’, International Journal of Law and Information Technology 27(2) (2019), pp. 171–203.

97.

A. Rajkomar, M. Hardt, M. D. Howell, G. Corrado, and M. H. Chin, ‘Ensuring Fairness in Machine Learning to Advance Health Equity’, Annals of Internal Medicine 169(12) (2018), pp. 866–872.

98.

L. Ledford, ‘Millions of Black People Affected by Racial Bias in Healthcare Algorithms’, Nature 574 (2019), pp. 608–609.

99.

C. Criado Perez, Invisible Women: Exposing Data Bias in a World Designed for Men (Vintage, 2020).

100.

M. Whittaker, Meryl Alper, Cynthia L. Bennett, Sara Hendren, Elizabeth Kaziunas, Mara Mills, Meredith Ringel Morris, Joy Lisi Rankin, Emily Rogers, Marcel Salas, and Sarah Myers West, ‘Disability, Bias and AI’, The AI Now Institute, 2020, available at .

101.

Z. Obermeyer, B. Powers, C. Vogeli, and S. Mullainathan, ‘Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations’, Science 366(6464) (2019), pp. 447–453.

102.

A. Lashbrook, ‘AI-Driven Dermatology Could Leave Dark Skinned Patients Behind’, The Atlantic, August 2018, available at .

103.

Acknowledged in NHSX AI report as a priority, .

104.

S. Das, ‘It’s Hysteria, Not a Heart Attack, GP Babylon App Tells Women’, The Sunday Times, October 2019, available at https://www.thetimes.co.uk/article/its-hysteria-not-a-heart-attack-gp-app-tells-women-gm2vxbrqk#:~:text=%E2%80%9CAs%20a%2021st%2Dcentury%20doctor,and%20is%20for%20information%20only; N. Carding, ‘Regulator Reveals “Concerns” Over Babylon’s Chatbot’, Health Service Journal, 4 March 2019, available at .

105.

[1925] 94 LJKB 791 [13].

106.

Lodge, ‘Gross Negligence Manslaughter on the Cusp’, p. 125.

107.

[1994] 98 Cr App R 262.

108.

[1995] 1 AC 171.

109.

R v Misra and Srivastava [2005] 1 CR App R 21.

110.

[2016] EWCA Crim 741.

111.

[2017] EWCA Crim 1716.

112.

[1925] 94 LJKB 791 (Lord Hewitt) [13].

113.

DPP v Andrews [1937] AC.

114.

R v Lamb [1967] 2 QB 987.

115.

R v Stone and Dobinson [1977] QB 354. Here, the court explicitly endorsed recklessness as the test and raised the prosecutorial bar a little in the process, stating that what was required for liability was ‘a reckless disregard of danger for the health and welfare of the infirm person, mere inadvertence is not enough’ and that ‘the defendant must be proved to have been indifferent to an obvious risk to health, or to have foreseen the risk but been determined nevertheless to run it’.

116.

R v Seymour [1983] 1 AC 624.

117.

R v Kong Cheuk Kwan [1986] 2 Cr App R 18.

118.

MPC v Caldwell [1982] AC 341.

119.

A. Ashworth and J. Horder, Principles of the Criminal Law, 7th ed. (Oxford: Oxford University Press, 2013).

120.

C. Crosby, ‘Gross Negligence Manslaughter Revisited: Time for a Change of Direction?’ The Journal of Criminal Law 84(3) (2020), pp. 228–245.

121.

The four-part test was set out as follows: Did the doctor show obvious indifference to the risk of injury to his patient? Was he aware of the risk but nonetheless for no good reason decided to run the risk? Was an attempt to avoid a known risk so grossly negligent to deserve punishment? Was there a degree of inattention or failure to have regard to risk, going beyond mere inadvertence?

122.

[1995] 1 AC 171 [187].

123.

Op. cit., 187.

124.

Op. cit., 187.

125.

Quick, ‘Medicine, Mistakes and Manslaughter’, 191.

126.

Op. cit., 191.

127.

T. Ward, ‘Usurping the Role of the Jury? Expert Evidence and Witness Credibility in English Criminal Trials’, International Journal of Evidence & Proof 13 (2009), 83–101.

128.

[2016] EWCA Crim 1716.

129.

[2017] EWCA Crim 1168.

130.

[2020] EWCA Crim 1093.

131.

G. Milner, ‘Death by GPS: Are Sat Navs Changing Our Brains?’ The Guardian, 25 June 2016, available at .

132.

[2005] 1 Cr App R 21.

133.

[2005] 1 Cr App R 21.

134.

Ashworth and Horder, Principals of Criminal Law, p. 294.

135.

Quick, ‘Medicine, Mistakes and Manslaughter’, p. 189.

136.

Ashworth and Horder, Principles of the Criminal Law, p. 294.

137.

A. Mullock, ‘Gross Negligence Manslaughter and the Puzzling Implications of Negligent Ignorance: Rose v R [2017] EWCA Crim 1168’, Medical Law Review 26(2) (2018), p. 354.

138.

R v Rudling [2016] Crim 741.

139.

It was assumed that the ongoing medical issues may have had a viral cause, although no diagnosis had been made.

140.

The conviction was subsequently successfully appealed.

141.

[2017] EWCA Crim 1168.

142.

K. Laird, ‘Manslaughter: R v Rose (Honey Maria) Court of Appeal’, Criminal Law Review 76 (2018), p. 81.

143.

[2020] EWCA Crim 1093 (Lord Burnett) [5].

144.

Mullock, ‘Gross Negligence Manslaughter and the Puzzling Implications of Negligent Ignorance’, p. 354.

145.

Available at (accessed 1 May 2023).

146.

Available at (accessed 1 May 2023).

147.

J. De Fauw, Joseph R. Ledsam, Bernardino Romera-Paredes, Stanislav Nikolov, Nenad Tomasev, Sam Blackwell, Harry Askham, Xavier Glorot, Brendan O’Donoghue, Daniel Visentin, George van den Driessche, Balaji Lakshminarayanan, Clemens Meyer, Faith Mackinder, Simon Bouton, Kareem Ayoub, Dominic King, Alan Karthikesalingam, Cían O. Hughes, Demis Hassabis, Trevor Back, Mustafa Suleyman, Julien Cornebise, and Olaf Ronneberger, ‘Clinically Applicable Deep Learning for Diagnosis and Referral in Retinal Disease’, Nature Medicine 24(9) (2018), pp. 1342–1350.

148.

J. M. Ahn, S. Kim, K. S. Ahn, S. H. Cho, and U. S. Kim, ‘Accuracy of Machine Learning for Differentiation Between Optic Neuropathies and Pseudopapilledema’, BMC Ophthalmology 19(1) (2019), 178; D. Milea, R. P. Najjar, J. Zhubo, D. Ting, C. Vasseneix, X. Xu, M. Aghsaei Fard, P. Fonseca, K. Vanikieti, W. A. Lagrèze, C. La Morgia, C. Y. Cheung, S. Hamann, C. Chiquet, N. Sanda, H. Yang, L. J. Mejico, M.-B. Rougier, R. Kho, T. Thi Ha Chau, S. Singhal, P. Gohier, C. Clermont-Vignal, C.-Y. Cheng, J. B. Jonas, P. Yu-Wai-Man, C. L. Fraser, J. J. Chen, S. Ambika, N. R. Miller, Y. Liu, N. J. Newman, T. Y. Wong, and V. Biousse, ‘Artificial Intelligence to Detect Papilledema From Ocular Fundus Photographs’, New England Journal of Medicine 382(18) (2020), pp. 1687–1695.

149.

Comments taken from an interview given to BBC Horizon ‘The Computer Will See You Now’ first aired September 2018, available at (accessed 1 May 2023).

150.

‘Apps and Algorithms May Support But Never Replace G.Ps’ RGCP’ 27 June 2018, .

151.

M. Fraser, Enrico Coiera, David Wong, ‘Safety of Patient-Facing Digital Symptom Checkers’, The Lancet 392(10161) (2018), 2263–2264.

152.

Das, ‘It’s Hysteria, Not a Heart Attack, GP Babylon App Tells Women’; Carding, ‘Regulator Reveals “Concerns” Over Babylon’s Chatbot’.

153.

For information about the Babylon triage model see A. Baker, Y. Perov, K. Middleton, J. Baxter, D. Mullarkey, D. Sangar, M. Butt, A. DoRosario, and S. Johri, ‘A Comparison of Artificial Intelligence and Human Doctors for the Purpose of Triage and Diagnosis’, Frontiers in Artificial Intelligence 3 (2020), p. 543405; For more about the Bayesian modelling see D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques (Cambridge, MA: MIT Press 2009).

154.

It would also have demonstrated exactly when this risk became known to the clinician and provided clear evidence of the subjective epistemic condition of the clinician at that time.

155.

M. Komorowski, L. A. Celi, O. Badawi, A. C. Gordon, and A. A. Faisal, ‘The Artificial Intelligence Clinician Learns Optimal Treatment Strategies for Sepsis in Intensive Care’, Nature Medicine 24(11) (2018), pp. 1716–1720.

156.

[2005] 1 Cr App R 21.

157.

[2017] EWCA Crim 1716.

158.

[2016] EWCA Crim 1841.

159.

[2017] EWCA Crim 1716 (Sir Brian Leveson P) [140].

160.

[2005] 1 Cr App r 328.

161.

Op. cit.; (Judge LJ) [L4].

162.

Dr Misra and Dr Srivastava were responsible for day and night shifts, respectively.

163.

M. C. Elish, ‘Moral Crumple Zones: Cautionary Tales in Human-Robot Interaction’ Engaging Science Technology and Society (2019) (pre-print), available at (accessed 1 May 2023).

164.

D. Shepardson, ‘Distracted Backup Driver Cited by NTSB in Fatal-Self Driving Crash’, Reuters, 2019, available at (accessed 1 June 2021).

165.

It is worth noting that in English law drivers are usually held criminally responsibility when using a mobile device at the time of a fatal collision. Such conduct is indictable as either gross negligence manslaughter, death by dangerous driving (s1 RTA 1988) or death by careless driving (s 2B RTA 1988). The cases of R v Payne [2007] EWCA Crim 157, AG Ref No 18 [2009] EWCA Crim 1004, and R v Usaceva [2015] EWCA Crim 166 suggest convictions of death by dangerous driving are likely for fatal collisions while distracted by a mobile phone. In terms of backup drivers of ‘self-driving cars’, the case law on what ‘driving’ means is uncertain in the age of autonomous vehicles: In R v MacDonagh [1974] 59 CrApp (1) the essence of driving was the use of the driver’s controls to direct the movement of the car, but additionally (2) ‘driving must be given its ordinary meaning’. Since both the vehicle and the backup driver arguably meet the criteria for ‘driving’ and the ‘ordinary meaning of driver’ is undergoing a technological reontoligisation, there is a degree of legal uncertainty around the exact responsibilities of backup drivers. However, without cars being ‘fully autonomous’ in the near future, backup drivers may have to assume full driver responsibilities and criminal penalties for accidents.

166.

NTSB Adopted Board Report ‘Collision Between Vehicle Controlled by Developmental Driving System and Pedestrian at Tempe Arizona March 2018’ at vii, available at (accessed 1 June 2021). There were multiple other relevant factors that may have contributed to the accident including: the inattention of the ‘backup driver’ at a crucial moment; the removal of the advanced collision warning system as a safety redundancy; a lack of monitoring of training backup drivers; a lack of system to ensure human operators were not becoming complacent; the drugs within the system of the victim were highlighted as a possible factor; a lack of sufficient regulations relating to automated vehicles were also highlighted as a factor in the crash.

167.

The trial is expected to take place in June 2023.

168.

H. L. A. Hart, ‘Negligence, Mens Rea and Criminal Responsibility’ in Punishment and Responsibility: Essays in the Philosophy of Law ( Oxford University Press, 1968), pp. 136–157, 147.

169.

In fact, Ashworth advocates its application to other areas of criminal law: see A. Ashworth, Principles of Criminal Law, 4th ed. (Oxford University Press, 2003), p. 195.

170.

It must be noted that there are suggestions that the GNM test has not completely abandoned subjectivity because there is evidence that prosecutors still look for it when determining whether to initiate a prosecution. see O. Quick, Expert Evidence the Role and Context of Character – Bioethics Medicine and the Criminal Law, Vol. 2 (Cambridge University Press, 2013). This however simply lends weight to the argument that the test should be one of subjective recklessness as it appears to be practically necessary.

171.

L. Alexander and Kimberly Kessler Ferzan, Crime and Culpability: A Theory of Criminal Law (Cambridge: Cambridge University Press, 2009).

172.

Smith, ‘The Element of Chance in Criminal Liability’, p. 73.

173.

Lodge, ‘Gross Negligence Manslaughter on the Cusp’.

174.

M. Brazier, ‘From “Theatre” to the Dock – Via the Mortuary’ in M. Brazier and S. Ost eds., Medicine and Bioethics in the Theatre of the Criminal Process (Cambridge: Cambridge University Press, 2013), p. 79; Smith, ‘The Element of Chance in Criminal Liability’.

175.

Knowledge and control form the Aristotelian conditions of responsibility: Aristotle, Nichomachean Ethics Book III; discussed in detail in the AI context in M. Coeckelbergh, ‘Artificial Intelligence, Responsibility Attribution and a Relational Justification of Explainability’, Science and Engineering Ethics 26 (2020), pp. 2051–2068.

176.

[1995] 1 AC 171 (Lord William Mostyn Q.C) for the defence [173].

177.

R v Cunningham [1957] 2 QB 396.

178.

Regina v G and another [2003] UKHL 50 [41].

179.

Quick, ‘Medicine Mistakes and Manslaughter’, p. 199.

180.

R v Spraggan [1990] 1 WLR 1073.

181.

R v Morgan [1976] 1 AC 182.

182.

R v Parmenter [1991] 94 Cr App R193.

183.

J. Gardener and H. Jung, ‘Making Sense of Mens Rea: Anthony Duff’s Account’, Oxford Journal of Legal Studies 11(4) (1991), pp. 559–588.

184.

For discussion see: O. Quick, ‘Medical Killing: Need for a Specific Offence?’ in C. M. V. Clarkson and S. Cunningham, eds., Criminal Liability for Non-Aggressive Death (Routledge, 2008), p. 155.

185.

Quick, ‘Medicine Mistakes and Manslaughter’, p. 190.

186.

G. Williams, Textbook of Criminal Law, 2nd ed. (London: Stevens, 1983), p. 91.

187.

Merry and McCall-Smith, ‘Errors, Medicine and the Law’, p. 2.

188.

D. Husack, Over Criminalisation: The Limits of the Criminal Law (Oxford: Oxford University Press, 2008).

189.

see ‘AI Ethics Guidelines for Trustworthy AI’ European Commission’, available at (accessed 1 May 2023).

190.

A more minimal application of the criminal law is also likely to initiate less police investigations which may also damage trust and create fear within the medical profession: a factor that is often overlooked because manslaughter convictions are still very rare occurrences.

191.

Holm et al., ‘A New Argument for No-Fault Compensation in Healthcare’.

192.

S. Dekker, Just Culture: Restoring Trust and Accountability in Your Organisation, 3rd ed. (Oxford: Taylor and Francis, 2017), p. 136.

193.

A. Sanders, ‘Victims’ Voices, Victims’ Interests and Criminal Justice in the Healthcare Setting’ in D. Griffiths and A. Sanders eds., Bioethics, Medicine and the Criminal Law, Vol. 2 (Cambridge, 2013), p. 81; S. Dekker and H. Breakey, ‘Just Culture: Improving Safety by Substantive, Procedural and Restorative Justice’, Safety Science 85 (2016), p. 187; Anne-Maree Farrell, Amel Alghrani, and Melinee Kazarian, ‘Gross Negligence Manslaughter in Healthcare: Time for a Restorative Justice Approach?’, Medical Law Review 28(3) (2020), 526–548.

The possibility of AI-induced medical manslaughter: Unexplainable decisions,epistemic vices,and a new dimension of moral luck

Abstract

Keywords

Introduction

AI in healthcare

Unexplainable decisions

Epistemic vices

The new dimension of moral luck

GNM: the unexplainable law

The law as it currently stands

AI systems and manslaughter convictions

Responsibility of the HCP

Conclusion

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

ORCID iD

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.

27.

28.

29.

30.

31.

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

42.

43.

44.

45.

46.

47.

48.

49.

50.

51.

52.

53.

54.

55.

56.

57.

58.

59.

60.

61.

62.