Abstract
This article examines the development of a predictive system intended to forecast favourable trial outcomes within Chile's Public Defender's Office, analysing how ethical principles are negotiated, contested, and transformed throughout its design and implementation. Drawing on six months of ethnographic fieldwork - including participant observation, planning meetings, and interviews with developers, officials, and public defenders - the study shows that ethics does not operate as a stable, universal framework. Instead, it emerges as a situated ‘matter of concern' shaped by institutional asymmetries, competing professional repertoires, and the constraints of public-sector infrastructures. While Chile's AI governance frameworks endorse principles such as fairness, accountability, and transparency, the translation of these ideals into practice generated frictions across three phases: problem identification, operationalization through ‘business rules’, and system rollout. Developers approached the project as a technical classification task; officials framed it as a managerial instrument for audit and optimization; defenders perceived it as a mechanism of surveillance that failed to capture the complexity of legal practice. These divergent imaginaries exposed the limits of abstract ethical guidelines and the risks of ethics-washing in public administration. By foregrounding everyday negotiations, the article argues for context-sensitive, participatory approaches to AI governance that extend beyond compliance with high-level ethical principles.
Keywords
This article is a part of special theme on Datafied Development. To see a full list of all articles in this special theme, please click here: https://journals.sagepub.com/page/bds/datafied_development?pbEditor=true
In recent years, the integration of predictive systems into public sector governance has accelerated, driven by promises of improved efficiency and decision-making (Gutiérrez Rodríguez, 2023). Yet this integration has surfaced deep tensions between bureaucratic procedures, technical expertise and the lived experiences of those directly affected by algorithmic decisions (Henriksen and Blond, 2023). Ethics has emerged as a central framework for governing such systems, especially amidst concerns about data privacy, systemic bias and the lack of institutional accountability (Crawford and Schultz, 2019).
While some scholars argue that predictive systems should align with social norms to promote public well-being (Ullrich and Diefenbach, 2023), others frame algorithmic reasoning as a rational foundation for moral progress (Gabriel, 2020). In both perspectives, ethics often appears as an abstract solution to the social frictions generated by artificial intelligence (AI) (Morley, 2023). However, tools such as codes of ethics and high-level guidelines frequently fail to translate into practice, offering limited guidance for real-world implementation (Morley et al., 2020). In practice, ethical decision-making becomes even more complex when it is negotiated among developers, public officials and end-users (Sigfrids et al., 2022).
Mainstream frameworks such as the OECD's AI Principles (2019) and UNESCO's Recommendation on the Ethics of AI (2022) promote core values such as transparency, fairness and accountability. However, critics argue that these efforts often lack enforceability – a phenomenon labelled ‘ethics washing’ or ‘tech ethics’ where ethical principles are invoked rhetorically without institutional consequences (Green, 2021; Humeres et al., 2025). Moreover, scholars increasingly emphasise that ethics should not replace but complement legal and regulatory frameworks (Floridi and Cowls, 2019), calling for more flexible, transparent approaches that can adapt to emerging value conflicts (Helm et al., 2022).
Despite these developments, there remains limited empirical understanding of how ethical principles are negotiated, resisted or reinterpreted by the workers responsible for building and implementing algorithmic systems – an issue that becomes particularly salient in critiques of ethics-washing, where ethics is deployed symbolically rather than substantively. This article positions itself at the intersection of three debates: (1) algorithmic ethics and the politics of AI governance; (2) critiques of ethics-washing and the limits of voluntary ethical guidelines; and (3) AI accountability as an ongoing, situated practice shaped by institutional constraints and professional repertoires.
In national contexts like Chile, ethics has become one of the pillars of the National Artificial Intelligence Policy – updated in 2025 with the aim of positioning the country regionally in terms of implementation (Ministerio de Ciencia, Tecnología, Conocimiento e Innovación, 2025: 2). Building on this approach, the current centre-left administration has sought to mitigate the potential negative consequences of automated systems without discouraging AI investment, developing a set of intervention strategies informed by international experience (Arriagada-Bruneau et al., 2025; Hermosilla et al., 2021). Yet this discourse often obscures controversies among internationally influential groups that seek to evade responsibility by delegating regulatory design to corporate discretion (Humeres et al., 2025). Moreover, the uncritical adoption of statistical equity metrics – effective for bias recognition in the Global North – can produce adverse results in peripheral regions, where diversity and structural inequalities demand context-sensitive approaches (Borch, 2025).
Despite the growth of regulatory frameworks and critical scholarship, a key gap remains: how ethics is perceived, negotiated and enacted by the workers involved in designing and implementing AI systems. Cross-national studies in China (Di, 2023), the US (Nedzhvetskaya and Tan 2022) and Brazil (Soares Seto, 2025) show that tech workers hold ambivalent views shaped by their local contexts and power dynamics. In some cases, they have engaged in collective action to contest how harm is defined and addressed within AI development. These findings underscore the need to move beyond top-down ethical frameworks and systematically examine the disputes and tensions that emerge among diverse, and often unequal, stakeholders.
In Chile, the introduction of predictive systems is embedded in a longer trajectory in which technology and governance have repeatedly blurred into one another, turning the country into a laboratory where calculative devices participate in defining the political (Tironi and Barandiarán, 2014). From early cybernetic experiments to the consolidation of a managerial state, automation has been consistently invested as a privileged instrument of state modernisation (Medina, 2005). In the current post-neoliberal conjuncture, however, this history is rarely made explicit. Frameworks of ‘ethical’ or ‘responsible’ AI promoted by organisations in the Global North are often adopted as seemingly self-evident benchmarks, even though – as our case suggests – their incorporation remains fragile and contested and may sit uneasily with local institutional asymmetries and concrete arrangements of justice and algorithmic governance in Chile.
This article contributes to debates on algorithmic ethics and AI accountability by showing how ethical principles are not simply implemented but actively contested in everyday institutional practice. In doing so, we foreground ethics not as a stable normative framework but as a matter of concern (Latour, 2005) – a negotiated, evidentiary and power-laden terrain where actors mobilise competing notions of fairness, responsibility and due process. While this approach highlights the situated nature of ethical reasoning, it does not replace the need for regulatory safeguards; rather, it reveals where abstract principles fail when confronted with real institutional frictions. This perspective helps explain why ethics-washing thrives: abstract principles become detached from the situated frictions, evidentiary demands and institutional constraints that shape actual decision-making. We demonstrate how ethical concerns generate competing evidentiary demands, fractures between professional repertoires and divergent interpretations of responsibility. By tracing these dynamics, we argue for a pluralised, context-sensitive understanding of algorithmic ethics, where accountability emerges from concrete negotiations rather than compliance with voluntary guidelines.
Building on this conceptual lens, the article examines the Chilean Public Defender's Office (PDO) and its attempt to deploy a predictive system designed to forecast favourable trial outcomes. The initiative emerged within a broader process of digital transformation across Chile's judicial system – one that began in 2016 with the digitisation of judicial procedures and expanded to incorporate data-driven tools aimed at improving administrative efficiency (Amunátegui et al., 2022). In this context, and amidst chronic resource shortages, the PDO adopted an automated system to assist case auditing and guide defence strategies by estimating the likelihood of favourable results and comparing them with actual outcomes. ‘Favourable outcomes’ were defined in binary terms – acquittals, reduced sentences or alternative resolutions – based on variables developed jointly by the consulting team and the PDO. Because of the highly sensitive nature of legal evaluation and the reputational risks involved, the PDO was selected as a pilot institution to embed ethical safeguards in automation and to standardise the assessment of defence lawyers. What had been framed as a neutral performance-improvement tool quickly exposed deeper tensions around responsibility, fairness and due process – issues central to the ethics of public-sector AI.
Rather than starting from a predefined notion of ethics (Crawford and Calo, 2016) or applying a normative-critical lens (Iliadis and Russo, 2016), this study adopts an inductive, pragmatic approach. Drawing on interviews and fieldwork, we examine how actors articulate and justify their decisions in practice (Boltanski and Thévenot, 1999), focusing on their lived experiences during the system's development (Currie and Podoletz, 2023).
Our findings depict a more mundane and fragmented reality than what international standards envision. Paradoxically, efforts to raise ethical concerns have deepened divisions and hardened opposing stances, reducing the willingness to engage across different evaluation repertoires. The PDO case illustrates the limitations of abstract ethical frameworks – often promoted by international organisations such as UNESCO or the OECD – in addressing the situated challenges of AI implementation. It highlights the need to understand ethics not as a fixed standard, but as a contested terrain shaped by power, context and everyday institutional practice.
Beyond principles: Rethinking Ethics as a ‘Matter of Concern’
Ethics have been defined as a set of guiding principles and frameworks that inform the responsible development and deployment of AI technologies (Floridi and Cowls, 2019). In recent years, the ethical implications of algorithm-based systems have been extensively explored through comparative studies (Tsamados et al., 2021). These studies have shed light on various aspects, such as the benefits of ethical standards in auditing processes (Mökander and Floridi, 2021), the limitations of transparency in governing algorithmic systems (Ananny and Crawford, 2016) and the identification of biases affecting fairness perceptions (Kordzadeh and Ghasemaghaei, 2021).
Algorithms are complex socio-technical systems (Ananny and Crawford, 2016) that raise concerns not only about their code but also, and perhaps more significantly, about the people involved (Crawford and Calo, 2016). As Floridi (2018: 2142) suggests, there is a dual benefit of an ethical approach to the design and implementation of AI systems. On the one hand, ‘maximising opportunities while anticipating, reducing or preventing potential harms requires, on the other hand, addressing challenging questions about their design, development, deployment, and usage’. This includes examining practices, users and the data that sustain the entire algorithmic life cycle (Cath et al., 2018).
Ethics as a set of guiding principles and frameworks in the making and development of AI systems – developed, for instance, by organisations such as the OECD (2019) or UNESCO (2022) – seek to promote beneficence, non-maleficence, autonomy, justice and explicability in the public realm. However, as Morley et al. (2020) argue, translating abstract ethical principles such as fairness, accountability and transparency into concrete tools and practices remains challenging. Their study finds that publicly available AI ethics tools vary widely in their approaches and struggle to bridge the gap between principle and practice, often lacking guidance for real-world application. The absence of standardised frameworks results in inconsistencies across AI systems, highlighting the need for more cohesive methods for integrating ethics into AI development and governance.
Importantly, these limitations have led scholars to argue that ethical guidelines can be easily co-opted as forms of ‘ethics washing’ – that is, voluntary, vague or strategically deployed frameworks that signal responsibility without imposing accountability (Green, 2021; Humeres et al., 2025). In this view, ethics becomes a discursive resource used to pre-empt regulation or to diffuse public criticism rather than a tool for institutional change. This critique is especially relevant in public-sector settings, where competing bureaucratic mandates, political pressures and public scrutiny create incentives to adopt ethical language while leaving underlying practices intact.
Despite these valuable contributions, the concept of ‘ethics’ often remains unquestioned as a normative framework or as a practical achievement, assumed to be necessary without being subject to debate – in practice – between different parties and their diverse perspectives (Helm et al., 2022). In other words, ethical standards are conceived as a necessity rather than a matter of debate considering expert and lay opinions in real situations.
To address this, we propose studying ethics in high-tech contexts as a ‘matter of concern’ (Latour, 2005) for multiple actors, each with their own repertoires of evaluation (Lamont, 2012). Latour distinguishes between ‘matters of fact’ – objective, detached statements – and ‘matters of concern’, which encompass the heterogeneous human and non-human elements that make issues contestable, negotiable and unfinished. Drawing on this approach, Ratner and Thylstrup (2025) show how the use of administrative records in predictive models produces new ‘afterlives’ for citizen data, revealing internal disputes over relevance, ethical legitimacy and the boundaries of algorithmic decision-making. In design research, Stephan (2015) highlights how ‘matters of concern’ function as attractors for organising competing values, needs and frames. While this perspective foregrounds the situated and negotiated character of ethics, it also has limits: contextual negotiation cannot replace stable regulatory frameworks – particularly when addressing persistent structural issues such as algorithmic bias or discriminatory outcomes.
By treating ethics as a matter of concern rather than a settled doctrine, we are able to empirically trace how ethical principles become sites of negotiation, justification and conflict. This approach illuminates how actors mobilise ethical claims to advance, resist or modify automated systems, and how these claims acquire legitimacy or are dismissed in practice. It also enables us to observe how ethical concerns generate new frictions among professional groups, particularly in public-sector organisations where accountability, legality and resource scarcity collide. In the Chilean case, this lens is crucial: the country's territorial and social diversity produces strong regional variation in crime patterns and legal practices, making standardised evaluation criteria highly contested. Moreover, concerns about discrimination related to ethnicity and migration status require context-specific adjustments to avoid reinforcing existing inequalities. These particularities show that ethics cannot function as a universal template but must be understood as a situated and negotiated practice.
Rather than starting with a predefined definition of ethics (Crawford and Calo, 2016) or adopting a normative-critical perspective on research findings (Iliadis and Russo, 2016), our approach is inductive. We focus on how actors justify their decisions (Boltanski and Thévenot, 1999) based on their lived experiences with automated systems as they are being developed (Currie and Podoletz, 2023). Inspired by pragmatic sociology (Thévenot, 2006), we analyse how stakeholder groups coordinate without a unifying moral foundation, relying instead on situated formats of compromise and tests of evaluation that sustain action in practice. Through an empirical case in Chile, we aim to systematically explore the disputes and tensions surrounding ethics from the perspectives of different stakeholders (Ratner and Thylstrup, 2025).
This study explores the ‘lived experiences’ (Currie and Podoletz, 2023) of actors responsible for designing and implementing an AI predictive system, focusing on the tensions that arise when attempting to establish a shared definition of ethics within the frameworks set by international organisations. Rather than concealing contradictions and conflicts among stakeholders (Law, 2004), we aim to elucidate how data science-based projects compel participants to engage in ethical negotiations and reconcile differing perspectives.
Studying ethics in action: Methods and fieldwork
The ethnographic study focuses on the development of a predictive system by Chile's PDO, designed to forecast favourable trial outcomes. This system estimates optimal trial results based on comparable cases, determined through expert negotiations. The PDO, a decentralised public service, provides legal defence to individuals facing criminal prosecution without representation. To enhance its auditing processes, the PDO has adopted data-driven methods, leveraging its own records and data science tools to improve the accuracy of outcome predictions for defendants.
To achieve this, the DPO made a public tender in order to hire a firm in charge of the development of the project. According to the binding terms and conditions of the call, the purpose of this public tender is to ‘select and contract an External Audit Service based on data regarding favourable outcomes in the provision of Public Criminal Defence. This service will utilise a predictive system to support audits on key aspects of legal defence, specifically in relation to case resolutions or procedural outcomes. The audit will be conducted using data from the DPP's system and applying AI and data science tools to enhance the evaluation process’.
The tender was awarded to an institute based at a public university in Santiago, Chile. The team in charge of this project was integrated by a group of scholars specialised in predictive models and big data. They will be referred to as AI developers.
We employed two data collection techniques after mapping key actors affected by the predictive system's implementation, including developers, PDO officials and public defenders: 1) Participant observation of the system's development process, including working days and coordination activities between the public institution and the technical service provider over six months; and 2) in-depth interviews with key stakeholders to capture their perspectives on the system's implementation.
The research spanned six months in 2022, combining participant observation with semi-structured interviews. Specifically, we conducted in-person observations during 13 planning meetings and five training sessions, all of which were central to the system's design and implementation. These activities facilitated an in-depth analysis of group dynamics, technical discussions and practical concerns that emerged throughout the process.
In total, we conducted nine in-person interviews (in Spanish) targeting three key groups: 1) AI developers from an external institute dependent on a public university in Santiago, Chile, were responsible for the system's technical creation, providing insights into design decisions and limitations. 2) PDO officials, who supervised and evaluated the project, offering perspectives on institutional expectations and resource allocation and 3) public defenders, the system's end users, whose interviews focused on their needs, perceived barriers and their relationship with the technology 1 .
This methodological approach allowed us to capture a comprehensive and diverse range of perspectives, spanning technical, institutional and practical dimensions. The combination of these methods ensures the study's validity by addressing multiple viewpoints on the phenomenon under investigation.
Regarding sample selection, we used a mixed strategy. Initially, we followed the development team during their working days and conducted interviews based on preliminary segmentation. To avoid over-representation or exclusion of relevant actors, we adjusted the actor mapping throughout participant observation, selecting key informants directly involved in the process. This flexible strategy enabled us to build a network of contacts based on the case's specificities rather than imposing a predefined categorisation of stakeholders (Atkinson and Flint, 2001).
Access to the case study was facilitated through a consultancy project led by UAI GobLab 2 , an innovation lab based at Universidad Adolfo Ibáñez (UAI) in Chile, specialising in the study and incorporation of ethical standards into AI systems in the public sector. UAI's GobLab support included an ethical risk assessment before the system's pilot phase, advisory services, academic research and data science ethnography to analyse interactions among actors in these sociotechnical systems, including participant observation (PDO, 2022). The authors were hired as researchers by UAI GobLab to examine this case study, mapping the process and analysing it for academic purposes. This involved exploring negotiations among public officials seeking efficiency in auditing processes, a specialised data science team responsible for building the model and the concerns of public defenders subject to audit through the predictive system.
Our analysis draws on the translation model (Akrich et al., 2002), which examines how a contested technological object is negotiated across stages on its path toward stabilisation. In contrast to linear diffusion accounts, this approach systematises resistance and conflict among actors with divergent interests in a proposed solution. Guided by this model, we focus on: (i) the needs that precipitated the project (problem identification), (ii) the operationalisation of the predictive system (interest) and (iii) the negotiations over its relevance and effectiveness under the proposed ethical standards (enrolment).
To document observations and interviews, we systematised relevant information through field notes and transcripts. To protect participant identities, we implemented anonymisation procedures, including encrypted coding of names, and obtained informed consent. The research highlighted access challenges due to conflicting dispositions toward certain project stages, a factor that should be considered in similar studies.
Doing ethics with data: The Chilean PDO
In this section, the results are presented as a sequence of negotiation processes among the various actors involved in the design and implementation of the predictive system to forecast favourable trial outcomes at the PDO:
The development of the PDO's predictive system illustrates how ethical reasoning is not absent but dispersed, displaced and negotiated across various institutional actors. Rather than following a predefined ethical script, stakeholders enact divergent understandings of fairness, value and responsibility – often without a shared vocabulary. This fragmentation reveals the limits of abstract ethical frameworks and underscores the need for governance models attentive to the situated, contested and collective nature of ethical work in AI implementation (Currie and Podoletz, 2023; Helm et al., 2022).
Making defence measurable: The birth of a predictive project
The system began as a response to resource scarcity and long-standing debates over ‘good defence work’, revealing how ethical tensions appear early, before any technical design (Crawford and Calo, 2016). While the PDO frames the system as a managerial tool for improving accountability and auditability, defenders perceive this as a threat to professional discretion and legal complexity. This section shows how early technical requirements encode assumptions about what constitutes ‘good’ defence work, narrowing the space for ethical nuance. These divergences reflect competing justificatory regimes (Boltanski and Thévenot, 1999) and prefigure tensions that will unfold throughout the system's operationalisation.
The PDO seeks to develop an external, data-driven audit of legal defence services by comparing predictive models with actual case outcomes. In the tender document, the institution specifies that the tool should ‘use AI and data science to assess defence performance, identify key variables and detect regional gaps and risk areas’ while also supporting ‘continuous improvement’ through training and iterative feedback. In addition, the project mandates the production of an audit report in Power BI, intended to translate these predictive analytics into insights and recommendations for the progressive adoption of such tools in legal defence oversight (Public Defender Office, May 19, 2022).
As one PDO official put it, the core need is to ‘evaluate my defenders’ performance based on a tool that broadly predicts case outcomes’. This goal arises from a set of existing evaluation needs that were later translated into technical requirements for the predictive system.
The variables 3 used to evaluate defenders’ performance are mandated by law and must be included in the system. They are monitored through quantitative and qualitative instruments. Quantitatively, monthly performance reports are generated from daily records that defenders enter into the system, using national indicators such as defendant interviews, illegality claims and preventive detention requests. Some indicators are tied to fixed payments, where non-compliance can lead to sanctions (including contract termination), while others are variable-pay indicators that grant access to bonuses.
Qualitatively, focused evaluations are conducted through three mechanisms. First, annual inspections by 14 national inspectors assess performance in hearings, combining expert judgement, file reviews and interviews with defenders, users and judges. Second, reactive inspections (complaints) address specific cases that trigger dissatisfaction; if analysis reveals deviations from expected standards, reprimands or sanctions may follow. Third, annual external audits review aspects of defence aimed at enhancing user satisfaction, monitoring perceptions of service in hearings, offices and prisons. Although interventions based on these audits are hard to implement, the PDO values the data they provide for action plans.
From defenders’ perspectives, however, these variables – and especially the instruments used to measure them – reproduce longstanding problems in their work. They demand a more focused selection of defenders to be evaluated, noting that the current sample of 15 cases is ‘extremely few’ to get a general impression of performance. Interviewees also stress the ‘human factor’ over machine-based criteria when identifying, in advance, which cases are most likely to comply with standards.
Another source of dissatisfaction concerns the expertise of evaluators. Defenders describe them as removed from courtroom realities and lacking recent hearing experience, offering little insight beyond what defenders already know. They highlight a gap between the growing complexity of defence work, which requires specialised knowledge, and an inspection model centred on generic compliance with manuals. This standardised approach overlooks local specificities and treats heterogeneous regions as comparable under the same criteria.
The tension between local nuances and national standards obscures how cases are distributed geographically and how crimes vary in complexity. Interviewees stress that robbery, traffic accidents, fraud and homicide are evaluated with the same indicators, producing uniform outputs that ignore case-specific conditions. For defenders, this creates an ethical dilemma: prioritising the unique aspects of a case to benefit the accused may jeopardise their own standing under rigid evaluation criteria.
A more limited understanding of the predictive system's underlying needs is evident among the providers responsible for its development. When asked about the problem they were addressing, they referred to presumed ‘accumulation of cases’ or the need for specialised training to manage large volumes of data for managerial decision-making: ‘We lack a detailed understanding of the management scope above us; we haven't had a meeting regarding that aspect, specifically on how the PDO is managed. However, I suspect that this tool is valuable for management decision-making’. (Andrés, AI developer provider, Santiago)
In this setting, providers show a solid grasp of the outcomes the PDO is aiming for – encoded in business rules for favourable and unfavourable results – but little interest in the evaluation procedures that shape defenders’ everyday concerns. As one developer explains, they ‘understand very well what [the PDO] want to achieve and the challenges they face’, but still lack clarity about the tool's concrete purpose, which they vaguely assume to be ‘decision-making support’, with the exact scope remaining undefined (Pedro, AI developer provider, Santiago).
Conversely, the PDO carefully curates which types of proceedings it wants the system to touch and how it intends to steer defence management, while defenders have only partial insight into these objectives and point to unmet needs that fall outside standardised instruments.
Taken together, the problem articulated by the PDO sets in motion a reconfiguration of the existing evaluative infrastructure around the actions to be predicted. Legal obligations, performance metrics and audit routines continue to operate, but once they are inscribed as features of the model, the practical premises on which they rested are unsettled. What appears as a standard delegation of judgement to the model's architecture instead becomes a focal point for critique: rather than channelling evaluation through a socially accepted predictive device, it foregrounds ethical concerns about fairness, discretion and local specificity, and redistributes them across a fragile and contested technical arrangement.
Translating ethics into ‘business rules’
Translating goals into variables exposed conflicts between managerial simplification and the discretionary nature of legal practice, illustrating how design decisions embed moral assumptions (Morley et al., 2020). Operationalising the system's goals involves the translation of abstract managerial aspirations into codified technical rules – a process that appears objective but is deeply embedded in organisational politics and normative choices. Here, ethics becomes an attempt to standardise and automate judgement through ‘business rules’, provoking resistance from those excluded from their formulation. This phase shows how moral judgement is displaced into technical classification, foregrounding the gap between rule-based governance and the situated knowledge of practitioners and the difficulty of building a shared ethical language across actors.
A committee of expert lawyers convened weekly to operationalise the needs identified by the PDO and to establish the business rules used to distinguish between favourable and unfavourable case outcomes. The aim was to define a standard based on the nature of each procedure and, in some instances, on the specific characteristics of each offense. This emphasis on offense specificity in evaluating defenders’ performance became a key driver of the initiative.
The commission comprised six to seven defenders from various regions, each representing different profiles. Participants acknowledge that reaching consensus on what counts as a favourable outcome was difficult, given the inherently contentious nature of the discussions: ‘Conversations stretched on for 5–6 months; here, we always say, put 5 lawyers together and you'll end up with 6 conclusions’. Managing consensus after each session thus posed an ongoing challenge for the PDO.
Despite access to meeting minutes through transparency requests, both the PDO's technical team and the suppliers lacked a full picture of the consensus reached by the commission. This gap reinforced a predominantly technical understanding of outcome identification: an outcome is ‘favourable’ if it meets a predefined set of business rules, such as securing a reduction relative to the sentence requested by the prosecution, which in turn allows the model to classify cases into different ‘types’ of favourable result: 1) No conviction; 2) Includes favourable convictions; 3) Includes favourable convictions and conditional suspension of proceedings (Pedro, AI developer provider, Santiago).
Some defenders described the committee's composition as ‘administrative and non-operational’. Others reported that they were initially invited but later excluded: ‘I have no clue who the committee of experts are. I know that our region was invited, but they removed us, and I have no idea why or who remained’.
As a result, defenders identify several potential issues that this process poses for the predictive system: they demand greater transparency about the range of offenses associated with each category of favourable outcome, question the criteria to define their favourability and argue that the model should be dynamically adapted to ongoing legal changes driven by criminal policy or prosecutorial decisions, which may affect the system's expected performance.
In a similar vein, defenders argue that the binary categorisation of outcomes as favourable or unfavourable flattens important nuances. They describe the current approach as ‘simplistic’, insofar as it classifies favourability mainly by offence type and rough shifts in penalty (for instance, whether the sentence was reduced by one or two degrees or whether the defendant is deemed to have ‘performed well’). Such metrics, they note, fail to capture cases where it is not clear that the defendant has in fact fared better, yet the outcome is nonetheless coded as favourable (Felix, DPO defender, Santiago).
The opacity attached to this quasi-democratic origin myth – grounded in expert opinion and in a presumed prior consensus that no one can fully reconstruct – now takes the form of ‘business rules’ that redefine success and failure. What was once presented as a collectively validated standard is recast as a binary rule set whose foundations remain largely inscrutable. Defenders repeatedly contest this black box that regulates the problem in absolute terms, while providers de-problematise its significance by treating its rigidity as a purely technical given. As a result, the very legitimacy of the evaluative format is thrown into question, eroding trust in the system's classification of case outcomes and starkly exposing, from defenders’ point of view, the gap between the objectivity promised by technical categorisation and the situated knowledge of those who litigate in specific contexts.
Whose data, whose ethics? Negotiating matters of concern
In the negotiation stage, divergent positions come into conflict, particularly around what counts as meaningful data and whose interpretations matter. Actors disputed variable relevance and fairness, especially given regional and socio-ethnic disparities – echoing how data practices create contested, unfinished ‘matters of concern’ (Latour, 2005; Ratner and Thylstrup, 2025). This section explores how technical actors, institutional managers and defenders confront the limits of their respective expertise. Ethical reasoning here is not absent but redistributed – externalised, proceduralised or rendered invisible through the division of labour. These negotiations reveal the infrastructural politics of predictive systems (Iliadis and Russo, 2016), where assumptions about objectivity and neutrality serve to sideline questions of justice, discretion and social impact.
Based on the PDO's need to enhance defence management through indicators, the institution proposed developing a predictive system using data from recent years and hired a technical team of suppliers to build the model. For these developers, the problem was initially framed in strictly technical terms: the task was to turn the ‘uncertainty surrounding case outcomes’ into a finite set of possible exits that could be encoded as business rules and predicted accordingly (Pedro, AI developer provider, Santiago).
The algorithm's inputs consist of a case database provided by the PDO and used to train the model 4 . According to the suppliers, the early stages were confusing and required them to ‘adjust to the language’ of the institution and align with the data owners. PDO officials describe this process similarly, noting that they not only delivered the data but also had to ensure that developers grasped its institutional meaning and learned how to extract ‘implicit information’ that was not immediately obvious from the raw fields (Hernan, PDO official, Santiago).
Understanding and cleaning the database took six weeks and involved several revisions. The technical team split the problem into smaller segments, constructing seven models with Tensor Flow to work with more homogeneous categories by the type of crime and to address what they described as ‘a classification problem’ depending on the kind of outputs.
The first major divergence between suppliers and the PDO emerged from the structure and interpretation of the data. The technical team expected a database already aligned with the PDO's specification, while PDO officials emphasise the ongoing effort needed to ensure that suppliers grasped the institutional meaning of the data: ‘You assume they grasped it, but that's not always the case’ (Hernan, PDO official, Santiago).
This divergence in the criteria applied to variables that each side considers crucial. Prior criminal history, for instance, was included after explicit insistence from PDO officials, who grounded their position in regulatory and professional criteria: ‘I was quite insistent with them about leaving out the previous criminal history from the analysis, and I stressed that it's an issue that must be addressed, and I explained how to gather that implicit information, so they had to include it. Because within the realm of our work, trying to predict an outcome without considering whether the defendant had prior serious criminal charges or convictions related to those crimes is impossible. That's a fundamental condition to consider in order to project anything. So, aside from that, in that particular case, it wasn't a negotiation but rather an imposition’ (Juan, PDO official, Santiago).
The contrast is particularly visible when ethical standards are discussed: ‘The concept of a favourable or unfavourable outcome is a technical one, devoid of considerations regarding gender, age or any other personal attributes; it's purely technical, reflecting how cases are concluded. Acquittals are acquittals, regardless of past biases against foreigners; that's merely an indicator of the case's nature, which we don't influence. Our goal is simply to predict in advance how cases will conclude, whether favourably or not, from a technical standpoint’. (Miguel, PDO defender, Santiago). ‘We deal strictly with numbers; we have no insight into the identities of the accused. None whatsoever. We can't visualise them. While we receive certain outcomes, we refrain from interpreting them since we lack expertise in the field. One could argue, “This interpretation is unethical”, but such judgements aren't within our purview; it's the role of the defence to interpret the data’. (Pedro, AI developer provider, Santiago).
Responsibility for bias is likewise externalised. PDO officials point to constraints in the available data – especially segmentation variables such as gender, where women represent less than 10% of the accused – and to limited time series (favourable outcomes being recorded only since 2017), which they argue restricts the possibility of meaningful bias analyses. Similar issues arise for defendants under 18, where data is sparse. At the same time, there is enthusiasm about incorporating court-specific variables to anticipate resource demands, such as the frequency and cost of expert reports, and to inform negotiations with the Ministry of Justice.
Initially, both the PDO and the suppliers linked ethical considerations primarily to anonymisation procedures: replacing identity card numbers, deleting repeated records and ensuring that developers could not identify individuals. From the developers’ point of view, this level of anonymisation was exceptional compared to other projects: they stressed that here ‘we have no idea what we’re processing’ and ‘we don’t know who they are’, highlighting that the PDO had taken unusual care to break any link between data and identifiable persons, even if this meant a longer process to prepare and deliver the dataset.
In broad terms, the negotiations during the model development reveal three distinct orientations: a ‘technical’ within the PDO, grounded in predefined business rules; an ‘instrumental’ stance among AI developers, focused on optimising predictions with minimal interpretative engagement; and a fragmented professional stance among defenders, who insist on case-based expertise that exceeds both technical and managerial framings.
Ethical questions circulate here as a displaced matter of concern, rarely anchored in the specific domain of expertise being consulted. The attempt to integrate ethical standards triggers a chain of deferrals in which actors display their repertoires of validity while shifting responsibility elsewhere, so that the ‘proper’ locus of ethical accountability is always slightly out of place. The laws of large numbers pursued by developers – imagined as latent in the data – do not align with the legal constraints that lead the PDO to insist on variables that are statistically marginal from a modelling perspective but normatively central. In this sequence of stalled negotiations, defenders’ demand for a situated consideration of cases becomes particularly troublesome for project managers. Predictive modelling thus emerges as a contested arena where technical, bureaucratic and professional justifications collide without a stabilising mediator, and where ethics figures less as a coherent framework than as a moving target that redistributes responsibility and exposes the fragility of claims to objectivity in evaluative practice.
Rolling out prediction: Between optimisation and surveillance
The implementation phase crystallises tensions between imagined and actual uses of the system. While institutional actors envision the tool as a neutral instrument for organisational optimisation, defenders interpret it through a lens of surveillance and professional disempowerment. In use, the system triggered new concerns about accuracy, accountability and professional evaluation, showing the limits of universal ethical guidelines and the need for context-sensitive governance (Green, 2021). This section examines how competing repertories (Lamont, 2012) and sociotechnical imaginaries (Jasanoff, 2015) shape the rollout of the system, revealing fractures in institutional trust and divergent expectations about transparency and accountability. Ethical engagement remains elusive – shielded by opacity, framed as technical or withheld to manage internal dissent – highlighting the challenges of integrating AI tools into bureaucratic settings in socially responsive ways.
The projected implementation of the system reveals varying perspectives on its future utility, highlighting gaps between the PDO's planning and the realities faced by public defenders. Specifically, the model's training and execution process are planned based on data collected within the last five years, supplemented by information from the past 10 years currently under processing. The PDO envisions executing the model quarterly or biannually, allowing for what they consider ‘a reasonable time to alter values or results’.
Providers, however, argue that timing has limited impact on the algorithm itself. They warn that running the model early in the process may yield less reliable results due to incomplete data, while running it later may improve accuracy. Ultimately, they stress that such decisions fall to the PDO audit team and were not clearly specified in the original tender, reflecting a misalignment in expectations about implementation conditions.
From the providers’ perspective, these implementation choices are an ‘audit’ matter, outside their remit. Compared to sectors such as banking, they regard technological deployment as relatively straightforward and see no need for detailed information about how the system will be hosted, managed or released. What they require, they argue, is sufficient data to build the models and a basic functional specification. This reinforces a division between development and implementation: the suppliers focus on technical feasibility, while broader organisational and user-related concerns are treated as secondary.
At the same time, providers acknowledge that neglecting the ‘start-up’ or operational phase increases the risk of unforeseen needs. They concede that developers often overlook functionalities that only become visible once a system is in use, and they recognise implementation as a crucial moment to refine models and incorporate new features based on user feedback.
The intended users of the system are management bodies who would employ it as a tool to observe and evaluate litigators’ performance and to guide defence strategies. These users have varying levels of technical proficiency and would likely require a demonstration stage to familiarise themselves with the tool. Yet the provider team has little information about end-users, treating them as abstract numerical entities rather than situated actors. Questions of organisational structures, decision-making authority and day-to-day practices are seen as external to the algorithm's correct functioning, again underscoring the divide between technical and user-centred considerations.
Defenders, by contrast, note that despite advances in the PDO's IT infrastructure, these resources are not fully available or usable in their daily work. They point to the uneven use of Power BI 5 – extensively adopted in some regions and almost unused in others – as evidence of a gap between data availability and internal analytical capacity. This suggests that existing information is not being systematically leveraged for internal evaluation before turning to external AI solutions. As one defender remarks, what they primarily need is ‘timely and quality information’ and more efficient internal systems for accessing legal materials and navigating databases, which ‘doesn’t necessarily require AI’ (Juan, PDO defender, Santiago).
In summary, there are divergent perspectives on how the predictive tool should be implemented: a relatively straightforward assessment by providers, a performance-oriented view within the PDO and a sceptical stance among defenders. This lack of alignment leads to discrepancies in usage guidelines. While PDO officials emphasise the tool's potential for optimisation through user feedback, the planned implementation remains opaque to many defenders. Worried about misunderstanding or resistance, the PDO opts not to share the tool itself but only aggregated results – one official explains that they will ‘present the conclusions over time’ after technical validation, since giving defenders direct access to the tool would ‘only confuse them’ (Juan, PDO official, Santiago).
This contrast between defenders’ calls for greater transparency about what is measured and how data are used, especially in tools such as Power BI. Several interviewees express concern about being excluded from key discussions and stress the need for access to comprehensive information, with appropriate safeguards for client confidentiality.
Past experiences with predictive tools further complicate legitimacy. Defenders recall a previous PDO model focused on arrest, preventive detention and illegality of detention, which struggled due to heterogeneous data entry practices across regions. Differences in formats and manual encoding by secretaries generated delays and inaccuracies, ultimately undermining the model's usefulness. More broadly, defenders see parallel data-entry systems as time-consuming and unreasonable unless such tasks are fully integrated and mandatory.
These experiences support the perception among defenders that predictive models often produce outputs that diverge from lived reality and offer limited practical insight for advocacy. While PDO officials maintain that extensive work has validated the current project and that it will generate useful information, defenders insist that algorithms need an additional ‘layer of reality’ that cannot be accessed from office settings alone. As one defender puts it, there is a clear difference between ‘being an expert from the outside and being in the field’: colleagues who have not been in a courtroom for years may be highly intelligent and skilled, but they are less attuned to how post-pandemic changes have reshaped the legal landscape and everyday defence practice.
Against this backdrop, some defenders call for broader participation in designing and rolling out the model, including representation from all affected regions and a phased implementation strategy that allows for local adjustment. They also point to gaps in technological literacy within the defence sector, which may hinder effective use of the system. While many acknowledge the importance of technological tools for streamlining processes and saving resources, they highlight managerial and training barriers that limit their integration into everyday practice. Gradual implementation, particularly targeting younger generations of defenders, is suggested as a way to mitigate cultural resistance.
For their part, PDO officials express concern about defenders’ readiness to engage with predictive systems that might be perceived as tools for evaluation and control. Defenders are described as highly resourceful and critical, capable of challenging models and devising strategies to circumvent them. This partly explains the current opacity around how the system will function and who will be able to interact with it.
Implementation here operates less as the consolidation of a tool than as the staging of a provisional settlement between competing logics of action. The predictive system is kept in suspension between several jurisdictions: promoted as an instrument for optimisation and performance indicators, maintained as a ‘clean’ technical artefact insulated from normative debate and perceived by defenders as a latent extension of surveillance over their work. What becomes institutionalised, at least for now, is not a shared ethical framework but a pattern of strategic distance: each group can appeal to the system's promise – of efficiency, of objectivity and of control – while shifting responsibility for its consequences onto others. Rather than clarifying how evaluation should be conducted, implementation foregrounds the asymmetries in who gets to define problems, interpret outputs and decide which forms of professional judgement can be streamlined in the name of optimisation or, conversely, treated as suspect under the gaze of prediction.
From guidelines to frictions: Rethinking ethical AI through practice
Our study explored the tensions surrounding the development and implementation of a predictive system at the Chilean PDO, focusing on the ethical challenges that emerged in this process. While international frameworks such as UNESCO (2022) and the OECD (2019) promote universal ethical guidelines for AI, our findings reveal the limitations of such abstract principles when applied in high-stakes environments like public criminal defence. Instead of ethics serving as a unifying framework, it became a point of contention among developers, public officials and defenders. This underscores our central contribution: ethics in AI is not a stable set of norms to be implemented but a contested sociotechnical and pragmatic process shaped by institutional cultures, power asymmetries and competing notions of fairness and responsibility.
One core issue is the disconnect between ethical principles and their practical implementation (Morley et al., 2020). Artificial intelligence ethics is often framed as a normative, standardised framework, yet our findings show that ethical concerns emerge through negotiation and contestation in everyday practice (Boltanski and Thévenot, 1999; Di, 2023; Luka & Tan, 2021). While the PDO sought to improve the objectivity and efficiency of legal audits, defenders viewed the tool with scepticism, arguing that it failed to capture the complexity of courtroom dynamics. Developers, in turn, approached the project as a purely technical exercise, emphasising algorithmic performance and remaining detached from legal and social implications (Henriksen and Blond, 2023; Kotliar, 2020). This misalignment reinforced existing institutional tensions. In this space of misalignment, ethical discourse also risked functioning as ethics-washing (Green, 2021), legitimising the project without substantively addressing defenders’ concerns.
Our study also contributes to debates on AI accountability in the public sector (Crawford and Schultz, 2019). Reliance on external AI vendors raises unresolved questions about responsibility and oversight. Developers largely bracketed issues of fairness and transparency as outside their mandate, while PDO management prioritised standardised evaluation metrics over sustained engagement with defenders. When ethics is reduced to indicators or delegated to external experts, it risks becoming a procedural formality rather than a substantive commitment to justice. This proceduralisation intensifies the possibility that ethics becomes prescriptive rather than transformative.
In the Global South, the infrastructures needed to implement such systems are often precarious, and predictive tools are frequently burdened with disproportionate expectations about their capacity to address structural problems in public institutions. Future research should examine more closely the future-oriented expectations that sustain this interest in prediction, and the frictions between the institutional conditions identified by defenders in this case and the developmental promises of Northern sociotechnical imaginaries (Richter et al., 2023). These tensions offer a crucial entry point for understanding how ethical standards are implemented – and reshaped – in situated practices of algorithmic governance.
These findings suggest that ‘responsible AI’ should be understood as a matter of concern (Latour, 2005) – a contested space shaped by competing institutional logics. Ethical claims are constructed, mobilised and disputed in practice, often without producing structural change. This aligns with recent discussions on public-sector AI governance (Sigfrids et al., 2022), which emphasise inclusive and context-sensitive frameworks. At the same time, recognising ethics as negotiated also requires safeguards to ensure that managerial demands for efficiency do not eclipse attention to structural inequalities or bias.
At a theoretical level, accounts of the ‘translation of interests’ (Akrich et al., 2002) may need to relinquish the expectation of a unified technological outcome once ethics is brought into view. Contrary to the widespread assumption that ethical frameworks are designed to foster consensus, our analysis suggests that ethical questioning in practice tends to amplify, rather than reduce, uncertainty: negotiations that appear well-regulated on paper become sites of friction (Tsing, 2004), and guidelines are enacted as public experiments in which issues are rendered as matters of concern and the justificatory repertoires of the actors involved are progressively exposed and tested (Latour, 2005).
Pragmatist philosopher Dewey (1908) understood ethics as a continual reconstruction of values grounded in experience and judged by their practical consequences, rather than by absolute universal criteria. Against both enthusiastic (Floridi and Cowls, 2019) and dismissive (Green, 2021) views of AI ethics, we use this case to sketch an experimental conception of ethics for technological systems and automation. In this perspective, ethical situations are moments when issues become matters of concern: certainty about the ‘correct’ application of a criterion is suspended, made public and opened to dispute, and future courses of action are recalibrated accordingly. In line with pragmatic approaches (Thévenot, 2006), the value of ethical guidelines lies less in providing final verdicts than in their capacity to translate divergent positions into workable practices and to organise forms of interdependence that can orient action across heterogeneous actors and settings.
Practically, this means moving away from top-down governance structures toward processes that actively involve end-users as co-constructors of ethical standards. Continuous dialogue among policymakers, legal practitioners and technical teams is necessary to ensure alignment between predictive systems and the lived realities of those they affect (Currie and Podoletz, 2023). Without such participation, ethics risks being absorbed into institutional routines that obscure rather than enhance accountability.
In conclusion, this study highlights the persistent gap between ethical AI principles and their operationalisation in real-world public-sector contexts. Without mechanisms to integrate multiple professional perspectives, AI governance risks reinforcing institutional silos and undermining accountability. Ethics must therefore be approached not as a static rulebook but as an ongoing, participatory process. Further research is needed on the role of private AI vendors in shaping public decision-making, particularly in criminal justice, where the stakes for fairness and accountability are exceptionally high.
Footnotes
Acknowledgments
The authors want to thank to the Public Defender Office (DPP) for giving us access to do this project.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Agencia Nacional de Investigación y Desarrollo (grant number NCS2024_021), and the Inter-American Development Bank, (Project ATN/ME-18240-CH).
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
