Abstract
This article challenges the idea that the turn from rule-based algorithm to machine learning systems leads to a decline in formal conceptualization. Through ethnographic research at two artificial intelligence (AI) production sites within the French justice system, the study shows that conceptual labor remains at the heart of machine learning, shifting from algorithm coding to the curation of training data. Revisiting Lawrence Lessig's claim “code is law,” the article argues that in AI, the influence of formal code has waned, but a new form of structured conceptual framing has emerged in the form of ground-truth datasets—where “ground-truth is law.” These datasets, shaped by a range of actors across the AI production chain, subtly guide algorithmic operations under the guise of neutrality. This study applies Anselm Strauss’ “arc of work” framework to identify five critical stages in AI production: goal setting, databasing, taxonomy construction, labeling, and monitoring—and demonstrates how conceptual understandings of the world are embedded within algorithmic systems in each phase of the process. The article then examines two key mechanisms that obscure this work: first, the fragmentation and distribution of AI tasks across a broad range of actors; and second, the shift of conceptual labor away from coding toward data preparation and algorithmic monitoring. This article lays the foundation for an investigative method aimed at tracking ethnographically the entire algorithmic production chain and the diverse actors involved, in order to better document how conceptual labor is integrated into machine learning systems.
Introduction
Since the early 2010, the exponential growth in computing power and data storage capacities—from 2 to 149 zettabytes globally between 2010 and 2024 1 —seemed to pave the way for a “revolution” in how information is produced and managed (Mayer-Schönberg and Cukier, 2013). This evolution provided a fertile ground for the rise of machine learning algorithms, which has been regarded as a breakthrough in the field of artificial intelligence (AI). While first-generation AI systems were based on predefined models of cognitive processes, typically organized as decision trees using “if…then” rules (Neyland and Möllers, 2017), machine learning algorithms produce outputs by identifying patterns within input data through large-scale probabilistic methods: the focus has shifted from providing machines with explicit instructions to feeding them large datasets, allowing them to develop their own analytical frameworks through vector-based computations (Domingos, 2015). The mathematical processing of these inputs, executed by powerful computing servers that seem to operate as “black boxes” (Diakopoulos, 2015; Pasquale, 2016) appears to depart from the cognitive modeling approach typical of rule-based algorithms. This shift has been characterized by researchers as a near-Copernican revolution: “The immense intellectual ambition of early AI to model reasoning has collapsed, although it left significant contributions to computer science research. Connectionist machines have moved the focus of AI from solving abstract problems, the subject of orthodox cognitive sciences, to perceiving patterns within massive streams of sensory data” (Cardon et al., 2018, personal translation).
However, based on an ethnographic investigation conducted at two AI production sites, this article argues that conceptual reasoning—defined as the process of formalizing knowledge and logically organizing ideas and representations of the world (Durkheim and Mauss, 1903)—remains central to machine learning processes. The significance of this kind of conceptual reasoning is evident in the coding of traditional rule-based algorithms, where “if…then” rule formulations directly reflect the categories and logical chains embedded in the algorithmic systems. In machine learning, the shift from formal rules to large-scale data processing may suggest, as some commentators have pointed (Anderson, 2008; Rouvroy and Berns, 2010), that such conceptual work is no longer at play. However, by paying attention to the various stages contemporary AI production, this study reveals that as AI tools have evolved the importance of conceptual work has not diminished. It has instead shifted from coding algorithms to the construction and curation of ground-truth data. This task is no longer reserved to informaticians and coders, but is now distributed across multiple actors involved in AI data work, who collectively contribute to the creation of an invisible yet essential algorithmic conceptual framework.
Building on this insight, the article revisits legal scholar Lawrence Lessig's seminal claim “code is law,” articulated in 2000 in response to the rise of cyberspace. Lessig (2000) argued that computer code increasingly shaped society: “Code regulates. It implements values or not. It enables freedoms or disables them.” Lessig warned against the opacity of this emerging form of social regulation, discreetly implemented by computer scientists, and questioned the extend of its democratic oversight: “The choice is not whether people will decide how cyberspace regulates. People-coders-will. The only choice is whether we collectively will have a role in their choice-and thus in determining how these values regulate-or whether collectively we will allow the coders to select our values for us.” In the era of AI, the significance of formal code has waned, but this does not mark the disappearance of a structured conceptual “law.” Instead, conceptualization has shifted further down the production chain, spreading across multiple actors and more diffuse spaces. Ethnographic observation of these activities reveals that in the context of machine learning, ground-truth is now law. Training datasets reflect the work, choices, and goals of the actors involved in AI development throughout the production chain. Under the guise of automation and neutrality, these datasets ultimately shape the functioning of algorithms, extending Lessig's concerns and raising crucial questions about accountability and control in the AI era.
Literature review
This article builds on and expands recent social science studies that have begun to explore the inner workings of algorithms, advocating for the establishment of a sociology of algorithmic conception (Christin, 2020; Crawford, 2021; Jaton, 2021b). It especially engages with three major groups of research.
A first line of research, rooted in Science and Technology Studies (STS), has focused on the innovators involved in the development of AI. These studies examine the roles of scientific and technical professionals, particularly data scientists, whose expertise is crucial in shaping and training algorithmic models (Engdahl, 2024; Jaton, 2017). They explore how these actors’ specific worldviews influence the development of AI (Muller et al., 2019) and analyze the often-frictional interactions between their expertise and that of domain professionals in areas where algorithms are deployed (Jaton, 2023). While offering valuable insights into the dynamics between technical and contextual knowledge, this research tends to overlook the broader range of workers involved in AI development, extending beyond computer science professionals.
In response, digital labor studies have shifted attention from technical professionals to the hidden workforce of AI production. They have underscored the essential role of microwork in AI development (Casilli, 2025; Roberts, 2019), shedding light on the hundreds of thousands of workers who globally contribute to the training of AI tools (Le Ludec et al., 2023). Those studies have drawn attention to the “dirty work” (Hughes, 1962) of AI, examining the organizational configurations (Miceli and Posada, 2022; Shestakofsky, 2024) of this work and the degraded conditions in which it is carried out (Gray and Suri, 2019). While these studies highlight the significance of this labor in producing training datasets, they do not typically employ ethnographic methods, which limits their ability to precisely capture how the choices and practices involved in these processes influence the functioning of AI models.
Bridging both groups of research, a final field of study has emerged in the recent years, with a more precise focus on the ways in which knowledge is formalized and embedded into machine learning algorithms. Known as “ground-truthing studies,” it focuses on the complex and multifaceted activities involved in the construction of AI training datasets (Jaton, 2017; Kang, 2023). These works have provided a theoretical framework to map out the various stages of production of AI ground-truth data, such as task definition, data collection, labeling, curation, and verification (Bechmann and Bowker, 2019; Jaton, 2023; Muldoon et al., 2024; Zając et al., 2023). Complementing these theoretical approaches, ethnographic research has focused on specific stages in this process, such as data selection (Avlona and Shklovski, 2024; Heimstädt, 2023) or labeling (Chandhiramowuli et al., 2024; Girard-Chanudet, 2023; Henriksen and Bechmann, 2020). These studies offer a detailed perspective on the choices and worldviews that shape AI datasets at different stages of their production. They draw on established methodologies from the sociology of information work and infrastructures (Bowker and Star, 1998, 1999), which has shown in nonalgorithmic contexts how data production shaped the actors involved in these processes (boyd 2016; D’Ignazio et Klein 2020; Latour 1995). Echoing the famous claim that “raw data is an oxymoron” (Gitelman et al., 2013), precise ethnographic studies have highlighted the meticulous labor required to build databases across domains like medicine (Dagiral and Peerbaye, 2016), the police (Donatz-Fest, 2024), public administration (Denis and Goëta, 2017), and the social sciences (Plantin, 2019). These studies, carried out in non-AI contexts, offer valuable insights for understanding the construction of AI ground-truths.
Together, these lines of research illuminate the extensive and intricate workflows required to create the ground-truths that underpin algorithmic performance. However, their emphasis on specific, isolated segments of the production chain—often centered on software engineers—or their limited use of ethnographic methods means they do not fully capture empirically how conceptual operations are carried out throughout the whole production chain and the ways in which they are progressively embedded within algorithms. This gap underscores the need for further investigations that comprehensively address the social dimensions of algorithmic design and implementation. In response, this article draws on in-depths ethnographic methods oriented to the study of complete algorithmic production processes. By doing so, it shows that conceptualization infuses each step of AI production: through the accumulation of doubts, negotiations, socially situated choices, and iterative processes along the production chain, all actors involved in the building of AI collectively contribute to the shaping of the ground-truths that underlie algorithmic functioning. Such a perspective mobilizes an STS conceptual and analytical framework, while considering AI more broadly as the product of collective labor, positioning it as a legitimate object for interactionist sociology of work (Strauss, 1975) and workplace studies (Luff et al., 2010).
Drawing on this conceptual framework, this article analyzes the production of AI sequentially, using Anselm Strauss’ (1985) notion of “arc of work.” This concept, rooted in the interactionist tradition, describes the globality of tasks involved in the realization of a project, which can be planed or unexpected; carried out sequentially or simultaneously. It allows to account for the diversity of actors involved in the process, for the division of labor that organize their relations and for the continuous adjustments necessary for the project to reach its goal. The objective here is to describe the gradual construction of the algorithmic conceptual framework as it progresses through the different segments of the AI arc of work, engaging with the various groups of actors involved. Framing AI's evolution as an “arc of work” allows to identify the multiple turning points, tensions, interactions, and feedback loops around which the AI conceptual framework is gradually adapted and reconfigured. Additionally, it offers a comprehensive perspective on all the actors contributing to AI production, in contrast to previous ethnographic studies that have often concentrated on specific groups such as data scientists and engineers or the “invisible labor” of data annotators.
Research methods
This analysis is grounded in a cross-ethnographic study of two AI projects undertaken by French judicial institutions, both initiated in 2018 but resulting in opposing outcomes. For context, the broader development of judicial AI in France began in 2016 with the enactment of the Law for a Digital Republic, which mandated the open data publication of judicial decisions. This legislative milestone led to the emergence of legal tech startups aiming to leverage these newly available datasets with machine learning algorithms. By 2018, judicial institutions joined this movement and also began exploring AI tools through internal experimentation. Legal documents, which are highly structured and codified, are often seen to offer fertile ground for AI applications (Deakin and Markou, 2020). Yet, as our research reveals, even in such context, preparing data for effective AI use requires significant data preparation and conceptual work.
Case A is an AI project called Datajust, led by the French Ministry of Justice and aimed at automatically estimating compensation amounts for bodily injury claims. In France, there is no strict framework for determining compensation in cases of bodily harm (e.g. from an accident), leading to significant variations based on the individual circumstances of each victim (Wajnsztok, 2020). Compensation amounts can differ widely depending on the case, the court, or the judge. In this context, Datajust algorithm was designed to assist legal professionals by analyzing previous litigation data and provide insights on how similar cases were handled. However, the project failed to anticipate the significant human labor required for AI training and operation and was subsequently abandoned in 2020.
Case B is Judilibre, an AI system developed by the Supreme Court to automatically anonymize the 4 million judicial decisions issued yearly by French courts before their public release. The algorithm is designed to protect the privacy of individuals mentioned in the rulings, while enabling large-scale public dissemination of these documents. Its development relied on the recruitment of a team of around 20 people, including engineers, data scientists, and annotators. Nearly 7 years later, the algorithm remains in use, supported by an expanded and permanent team.
Research on the development and use of both algorithms, combining ethnographic observation, interviews, and document analysis, was conducted between January 2020 and November 2021.
The investigation into case A involved targeted ethnographic observation of data annotation activities and algorithm training processes. Access to the field was notably challenging due to the project's highly protected nature. As a consequence, the researcher's entry occurred after key preparatory phases of the AI project, including the definition of objectives and the construction of training datasets. To mitigate these constraints, comprehensive interviews were conducted with all key stakeholders, including project-leading judges, data scientists, developers, and annotators. Additionally, a substantial body of project documentation, particularly working documents, was collected and analyzed. However, the premature termination of the project restricted the ability to follow its full lifecycle. In particular, later stages such as model re-annotation, monitoring, and correction could not be examined, thus restricting insights into these critical phases.
In contrast, the investigation of case B was significantly more extensive, benefiting from early access at the project's inception and full-time ethnographic immersion over 6 months, culminating in the deployment of the AI tool. This longitudinal engagement allowed for the observation of all project phases, from initial planning through to implementation. The study involved detailed ethnographic observation of the activities of all team members, including judges, data scientists, developers, and annotators, as well as the interactions and collaborative dynamics among them. The researcher also participated in collective working meetings, conducted regular follow-up interviews with team members, and systematically collected project documentation, including working papers, gray literature, and communication materials.
Overall, the study faced some challenges, primarily related to access to high-stakes, sovereign-level environments, which required the signing of confidentiality agreements in both cases. The resulting asymmetry in the research designs, arising from these access constraints, introduced certain complexities. While the differences in timing and access influenced the data collection process, they do not diminish the overall value of the insights gained. Although the methods employed in the two cases varied, both enabled a thorough examination of the entire AI production chain—from initial planning and data annotation to model training. This approach provided a nuanced understanding of the social, technical, and institutional dynamics involved in the development and implementation of AI tools in judicial contexts.
Through the comparative analysis of the failure of one project and the success of the other, this article underscores the indispensable role of data work in AI systems, as foundation of an invisible yet essential algorithmic conceptual framework. The demonstration of this hypothesis draws, first, on an ethnographic description of the human labor involved in framing the operation of algorithmic systems. Drawing from the cross-analysis of the two case studies, it shows that conceptualization is omnipresent in AI development, manifesting in distinct but crucial ways during the five main stages of the algorithmic production chain: goal setting, dataset building, taxonomic construction, data annotation, and monitoring. In a second part, the article builds on this empirical study to discuss the particularities of knowledge conceptualization for AI. It shows that the essential, abstract, and cognitive processing that underlies machine learning is rendered invisible due to its distributed nature and its relocation to both upstream and downstream stages in the production chain. The article proposes an analytical framework to study these activities, enabling for a better understanding and, when necessary, informed critique of the functioning of algorithmic models.
AI data work: A continuous modeling of knowledge
The two ethnographic studies that inform this article highlight a crucial aspect of AI: the algorithm is embodied in the multitude of workers involved in its creation and maintenance. These individuals, including data scientists, designers, engineers, annotators, and field-experts such as judges, play a crucial role in the incremental formalization of knowledge that allows its automated processing. Their data work is characterized by an ongoing, collaborative effort of conceptualization, which unfolds along an arc of work leading to the operationalization of the algorithmic system (Strauss, 1985).The fieldwork identified five key phases within this AI work arc, which intersect with the stages of AI production identified by Muldoon et al. (2024) and Bechmann and Bowker (2019): (1) goal definition, (2) dataset construction, (3) taxonomy creation, (4) data annotation, and (5) validation.
Goal setting
The development of an AI tool begins with the determination of the specific goals assigned to the algorithm. While this initial step may seem deceptively simple, it often becomes the first major obstacle for many AI projects. The powerful solutionist narratives surrounding AI (Mozorov, 2013) often drive organizations to train machine learning algorithm on already existing internal databases, hoping to generate new insights without clearly defining the desired outcomes. This was the case in project A, in which the French Ministry of Justice initiated the Datajust project, with the aim of experimenting for the first time with the training of machine learning algorithms on judicial decisions, but without a clearly defined set of goals. The institution later required external assistance to refine the project's objectives, as explained by the head of the public AI lab, who supported them throughout this process: The Ministry of Justice came to us saying, ‘So, we don't really know what's in the judicial decisions. We know many decisions are produced every day, but we don't know how to analyze them automatically.’ They had this idea that text analysis methods could extract a lot of information. But beyond that, they didn't really know what they wanted to do. So, we started thinking together. We proposed some ideas, suggesting, ‘Maybe you should start with a concrete example that would be genuinely useful to you. What would be your top priority?’ After several discussions, the issue of personal injury compensation emerged as a relevant and realistic sandbox—since it's a quantifiable type of harm and already well-studied. (Interview with AI Lab Director, Project A, January 2020)
The definition of priorities and overall objective marks the starting point of AI's arc of work, as no project can be launched without a definite direction. The objective may arise from an open desire for experimentation, as in case A, where the issue of personal injury compensation gradually became an appealing and practical case for exploration; even such experimental settings call for the preliminary delimitation of the algorithmic exploration, in terms of goals and perimeter. In contrast, objectives may be driven by more urgent or precise requirements, as was the case with project B, where the anonymization of judicial decisions prior to their public release was mandated by the General Data Protection Regulation (GDPR). For the Supreme Court, advancements in machine learning provided a timely solution to a long-standing issue, as manually anonymizing the four million decisions produced annually by French courts resulted virtually impossible, and the objective preceded the launching of the project.
The goalsetting phase is crucial, as it establishes the foundation for the subsequent stages of AI development. It starts shaping the project, providing it with a general orientation (the evaluation personal injuries, or the anonymization of court decisions). Conceptualization is already at work during this phase: objectives derive from prioritization operations and reflect a particular vision of what is valuable to pursue (Bowker and Star, 1999). In case A for instance, the focus on personal injury addresses a long-standing concern in the French judiciary about ensuring fairness in this uncertain area, which had previously been subject to standardization process, through the establishment of scales and reference frameworks. The objectives definition phase plays a pivotal role in aligning the technical capabilities of the AI system with the specific needs of the organization.
Building training datasets
Once objectives are set, training data must be structured accordingly. This databasing stage is crucial for AI development (Denton et al., 2021) and can either come before goal-setting—focusing on what insights the data can provide, as in project A—or after, aligning data with predefined goals. Regardless of timing, training data define what AI can process (Jaton, 2021a) and form the “artificial world” (Girard-Chanudet, 2023) from which the system learns.
Research highlights how data selection shapes AI, often introducing biases (Scheuerman et al., 2020; Zając et al., 2023). As Howard Becker noted, “all data are in a sense capta” (Becker, 1952)—datasets are not neutral or autonomous objects but are, at the contrary, the product of socially situated labor, reflected in their form and content. This social structuring has major political and ethical consequences. For instance, the COMPAS software, used in the US courts to predict recidivism, was criticized for racial bias due to an overrepresentation of Black individuals in its training data, prompting a recalibration of the dataset (Angwin et al., 2016; Beaudouin and Maxwell, 2023).
Although simpler in appearance, the cases analyzed in this study reveal similar challenges in dataset construction. Both teams initially relied on an existing database of appeal rulings (the JURICA database). A closer examination of interactions among the various professional groups involved in the development of the algorithmic tool reveals that, especially in case B, the choice to use this dataset was heavily influenced by the professional habits of the project leaders, who were magistrates. Judges are used to working mostly with appeal or supreme court rulings, known for their legal relevance, and, in this instance, transposed this habit to the algorithmic context. Drawing on their own experience they expected the algorithm to function consistently across all court levels—first-instance, appellate, or supreme courts. By so doing, they however failed to account for the significant differences in the format and content of these decisions, especially regarding the standardized nature of appeal rulings, which differ from the more varied formats of first-instance decisions that make up most rulings. A dataset consisting exclusively of appellate court rulings was therefore ill-suited for broader application, as a data scientist involved in the project explained: It's actually quite complicated. For first-instance court decisions, we realized the algorithm didn’t perform well. When I looked into it, it completely made sense. The decisions are entirely different—they're longer, more detailed, and much less formal in terms of legal language. It's not at all the same thing. But the magistrates hadn’t considered that, because they’re used to working with appellate court decisions. (Interview with data scientist, Project B, March 2021)
This excerpt underscores the disparity between legal expertise and the way in which judicial decisions are processed machine learning algorithms. The magistrates’ familiarity with appellate court decisions led them to spontaneously choose this dataset when launching the project. However, this choice, shaped by their professional background, confined the AI to an “artificial world” dominated by appellate court rulings, making it difficult for the algorithm to process decisions from lower courts. Eventually, the development team opted to create separate training datasets for each level of jurisdiction to better address this issue.
This example demonstrates that the choice of training datasets, while often seen as a straightforward technical decision, reflects the underlying assumptions and worldviews of the workers in charge of data selection. As a result, the selection process imposes a conceptual—and, at times, moral—framework on the outcomes produced by the algorithm (Engdahl, 2024).
Constructing a taxonomy
Defining objectives and creating datasets alone are insufficient steps to make an AI system operational. General project directions such as estimating corporal injuries compensation (project A) or anonymizing decisions (project B) do not inherently specify which elements define personal injury or what must be removed from court rulings to ensure their anonymization. The challenge is to translate these objectives into a structured set of categories that will serve as a framework for machine learning. This process, known as problematization (Jaton, 2017) involves classification work, where the project's overarching goals are systematically transformed into a clear and unambiguous classification system. For example, the concept of “privacy” in the context of anonymization must be concretely defined by identifiable elements (such as names, addresses, and dates), just as “personal injuries” must be broken down into specific damage categories (physical, emotional, financial, etc.) for compensation estimation.
The effort required to construct this underlying taxonomy can vary. In case A, the team opted to leverage a pre-existing classification system: the Dinthillac nomenclature, a well-established taxonomy in the field of personal injury law. This nomenclature, outlined in a 2005 report to the Minister of Justice, categorizes personal injury into 29 distinct areas (e.g. “aesthetic damage” or “professional impact”). Originally designed to inform judicial decisions in nonalgorithmic contexts, this nomenclature is widely used by judges. The AI team directly transferred this classification system into the AI production chain, as explained by the project lead: To guide the data annotation, we relied on a framework that has been in place in the justice system for years. The nomenclature defines all types of personal injury, to ensure full compensation for all damages, this was a crucial intellectual help for the project. We didn't modify it at all at the start of the AI project, which really simplified our work. That's a key reason why we selected the field of personal injury for our experiment. (Interview with project A Lead, February 2020)
In this case, taxonomic activity was minimal, and oriented toward the selection of a preexisting classification system. In contrast, in case B, the construction of a taxonomy represented a significantly more intensive effort. Specialized working groups of judges selected for their high-level expertise were formed and convened regularly to define the list of elements that should be automatically deleted from judicial decisions. This task extended beyond merely identifying personal names, addresses, and dates, as it required determining which details could potentially compromise privacy, without hindering the comprehension of the legal documents once anonymized. Through a combination of legal analysis, case reviews, and collaborative discussions, the working groups ultimately identified approximately 15 elements to be redacted from decisions, including names, addresses, localities, birth dates, marriage dates, corporate entities, establishments, social security numbers, phone numbers, bank account numbers, vehicle registration numbers, email addresses, and cadastral references. This process, as this judge recalls, was characterized by uncertainties, investigations, and socially situated decision making: I worked with a selection of decisions, to see what might be problematic within them. I also consulted colleagues, who made suggestions based on their experience. Everyone came to the meetings with their thoughts. Then we had debates, we didn’t always agree, which I think is a good thing! It was really a constant back-and-forth, between the decisions and the principles we were trying to apply […]. In the end, our reflection was… How can I put it, we have to be honest, it wasn’t very scientific. It was really the most practical considerations that guided our thinking and choices. (Interview with project B magistrate, member of the working group, June 2021)
The definitive lists of elements to be accounted for in algorithmic processing emerge through equivalent process in various contexts (Crawford, 2021). These lists form the backbone of any AI project, as they represent the classification system that will drive the activities carried out in the arc of work's subsequent segments, and that will ultimately be encoded into the algorithm. In this sense, taxonomy construction or selection is deeply political—it defines what will be visible or invisible (Bowker and Star, 1999; Star, 1991), shaping how an AI system will interpret and process data. When carried out internally, this phase is marked by intensive conceptual work, where relevant information for machine learning—such as what constitutes “privacy”—is defined, refined, and structured.
Labeling
The fourth step of AI production is data labeling. The connection of a dataset to a mere list of abstract categories does not result in the effective functioning of an algorithmic model: unlike human workers, algorithms lack the conceptual understanding required to interpret the elements they process—whether texts, images, video, or sound. As a result, at this stage, thousands of data (in both case studies, court rulings) still need to be manually annotated, pinpointing key elements to create example datasets that the model will replicate through probabilistic calculations. Data annotation, therefore, lies at the heart of AI development, establishing the foundational “ground-truths” for machine learning.
This process demands substantial human labor, which can be carried out within diverse organizational configurations. Depending on the context, annotation may be performed by highly skilled professionals like judges or physicians (Henriksen and Bechmann, 2020) or less-experienced workers (Le Ludec et al., 2023), either within the organization or outsourced to remote locations. Various studies have examined the role of underpaid, outsourced, and often offshored labor in supporting the rapid growth of AI technologies (Gray and Suri, 2019; Miceli and Posada, 2022). However, in the cases studied here, because of the sensitive nature of the data involved, the decision was made to internalize the annotation work at both the Ministry of Justice and the Supreme Court. This configuration allowed for ethnographic investigation of annotation activities, revealing the uncertainties, doubts, and decisions that shape the annotated datasets.
In both case studies, annotators were tasked with the reviewing of thousands of judicial rulings. One by one, they highlighted elements corresponding to concepts such as bodily harm or privacy and associated them with the appropriate categories previously defined within the classification system—as shown on the following illustration of the Ministry of Justice's working interface:
Available categories are displayed on the left. In the center appears the text of the ruling, and on the right an overview of annotated elements. Annotators manually tag relevant expressions with the corresponding label, progressively shaping the conceptual dimension of the training dataset. Although annotation is often seen as a repetitive and tedious task—which can be considered as the “dirty work” of AI (Hughes, 1962)—it presents significant conceptual challenges. Judicial decisions regularly contain elements that do not fit neatly into predefined categories, requiring annotators to bridge the gap between the unexpected elements present in the data and the rigid classification systems designed for the AI. One example, reported by an annotator, involved a racing horse's name cited in a decision: There's no category for that, but I find it highly identifying. A quick Google search would easily reveal the owner. I don’t think we can leave it in. (Interview with project B annotator, March 2021)
The annotator subsequently suggested categorizing the horse's name under “First Name”—noting that while this category wasn’t intended for such cases, it would at least allow the name to be anonymized, thus protecting its owner's privacy. This bridging work involves constant investigations and adjustments, as annotators attempt to assign a single label to heterogeneous and unpredictable elements within the data. They discuss the problematic cases with colleagues, perform internet searches to clarify the meaning of certain expressions, and draw on personal experience to determine the correct category for a word or phrase. In some instances, certain elements resist too strongly categorization, forcing annotators to adapt or “torque” (Bowker and Star, 1999) them to fit within the classification system, as for the horse name—an essential articulation work that ensures the proper functioning of the AI (Figure 1).

Case A: bodily harms annotation interface. Source: Author based on field observations.
The annotators’ social position, their understanding of the data, and the moral importance they attach to their work significantly influence the way data is annotated, thereby affecting the algorithmic outputs. For instance, at the Supreme Court, the individuals responsible for annotation were civil servants nearing the end of their careers with prior experience as court clerks or secretaries. Their extensive knowledge of the justice system and their experience with defendants led them to frequently annotate beyond the classification system when unsure, to ensure maximum protection for those mentioned in the decisions.
By populating classification categories with concrete examples, annotators perform a vital conceptualization task that the algorithm will later reproduce. The success of an AI project largely depends on this crucial stage of the arc of work. Recognizing the significance of this role, in case B, the Supreme Court recruited a team of 15 full-time, permanent annotators dedicated to processing judicial decisions. In contrast, in case A, the Ministry of Justice initially underestimated the importance of annotation, and only hired five part-time legal interns for a few months to annotate the first training dataset. The intern's departure and lack of budget to continue the annotation of new dataset played a major role in the ultimate failure and abandonment of the project, as data annotation is a continuous task that must be performed regularly and at large scale to adapt to the constantly changing nature of real-world data.
Monitoring
As I followed the annotators today, I noticed that the decisions displayed on their screens were no longer entirely blank. Some elements had already been highlighted and categorized—signs of a preliminary pass by the algorithmic model over the text. The annotators were meticulously reviewing these algorithm-generated annotations one by one, deleting errors, adding words that the machine had failed to detect, and resizing incorrect annotations. They corrected what the machine had previously done. (Fieldnotes, case study B, February 2021)
This fieldnote is illustrative of the core activity of the annotation team, as I primarily observed it during my time at the French supreme Court (case B). While the annotators occasionally contribute to the structuring of new training datasets, the predominant part of their work is focused on correcting and overseeing AI-generated results. Despite the conceptual work and fine-tuning performed earlier in the arc of work, the algorithmic outputs frequently contain errors, such as false negatives (missed identifying elements), false positives (incorrectly labeled identifying elements), and category misclassifications. These persistent errors underscored the limitations of automation, which struggles to fully account for the complexity and variability of real-world data. The annotators, in their role as final line of defense, perform critical conceptual adjustments, ensuring the accurate anonymization of legal decisions and contributing to the iterative improvement of the models. Their work can be viewed as a form of “algorithmic shepherding,” where they monitor and guide the models, making necessary corrections to ensure their proper function: Well, I don’t know what's going on with him today… but this is the third decision I’ve done, and he's missed a bunch of annotations. For this one, it's all the Hispanic names. Look, here, ‘Lopez'—not annotated. It's a big problem, if I let that slide, we could easily identify the person. So, there, I correct it. (Interview with annotator, project B, April 2021)
Much like surveillance in industrial contexts (Rot and Vatin, 2021), the primary goal of this oversight is to prevent “accidents,” such as the public dissemination of decisions containing personal information. Algorithmic failures carry significant risks, as evidenced by examples provided on platforms such as the AI Incident Database. 2 The need for such oversight varies depending on the specific risks associated with the AI system in question (Shaffer Shane, 2023). For project B, overseen by the supreme court, the stakes were particularly high: as the magistrate leading the project emphasized, “there is no room for error” in the anonymization process, which is legally mandated under the General Data Protection Regulation (GDPR). Given the importance of compliance, including external scrutiny from regulatory bodies like the French data protection authority (CNIL), the supreme court dedicated significant resources to annotation and correction, ensuring continuous monitoring of the model's performance by maintaining a permanent annotation and surveillance team.
In contrast, for project A, the need for algorithmic oversight had not been fully anticipated. By the time significant errors were detected following the initial training phase, the contracts for the annotation staff had already ended. The Ministry of Justice lacked the necessary human resources to conduct the extensive monitoring and correction required to adjust the models, which ultimately led to the project's termination. As the project lead explained during a public review meeting: Human resources were a major bottleneck. We hadn’t anticipated this when we started the project, but automation actually requires a lot of oversight, especially in a complex area like bodily injury. We simply didn't have the capacity to ensure the reliability of the results, and it seemed better to pause the project. (Project manager project A, public meeting, June 2022)
The oversight of algorithmic output represents the ultimate stage in an ongoing conceptualization process that spans the entire arc of work of AI. This phase is essential to reconcile the plural and dynamic nature of real-world data with the rigid classification frameworks used by algorithms. The centrality of this task, whose neglect contributed to the abandonment of project A, underscores once again the omnipresence of conceptual activities in AI development. Handled by various actors at different stages of the workflow, conceptual work supports, guides, and refines the functioning of the models. It is an essential condition for producing algorithmic results that meet the inherently conceptual expectations of the actors who rely on them.
Conceptualization is a continuous process throughout the AI development workflow. From defining objectives to monitoring models, each step the AI data work involves the conceptual framing and qualification of elements within the rulings. In the different segments of the arc of work, the actors perform a constant articulation work, aiming to reconcile the heterogeneous and unforeseen elements present in the data with AI's rigid classification system. Their decisions, informed by their conceptual and socially rooted understanding of the world, become embedded within the AI, directly shaping its outputs.
The invisibility of AI conceptual work
The ubiquity of conceptual work throughout the AI arc of work questions the paradigmatic shift supposedly brought about by the advent of machine learning in terms of loss of conceptual framing. The way AI practitioners collectively participate in modeling abstract concepts—such as bodily injury in case A or privacy in case B—closely parallels what anthropologist Diana Forsythe observed during the development of “expert systems” (symbolic algorithms) in the 1980: Expert systems are designed to emulate human expertise (…). Each system is intended to automate decision-making processes normally undertaken by a given human “expert” by capturing and coding in machine-readable form the background knowledge and rules of thumb (“heuristics”) used by the expert to make decision (…). Building an expert system typically involves the following steps: 1. Collecting information from one or more human informants and/or from documentary sources 2. Ordering that information into procedures (e.g. rules and constraints) relevant to the operations that the prospective system is intended to perform 3. Design or adapting a computer program to apply these rules and constraints in performing the designated operations. (
Forsythe, 2001
, p. 39)
Forsythe's research highlights how symbolic AI engineers engaged in “knowledge acquisition” activities to model both formal and tacit knowledge that their algorithms were intended to replicate through rigid sets of written rules. Contrary to the “Copernican revolution” associated with the rise of machine learning (Campolo and Schwerzmann, 2023), these processes closely mirror the taxonomy and annotation work involved in the training of contemporary algorithmic models. The key distinction between symbolic algorithms and machine learning only arises during the third step identified by Forsythe. In machine learning, conceptualization is no longer embedded directly into the code, but rather into the datasets used to train the models. Consequently, efforts to frame knowledge and structure information are now focused on the composition of these datasets. In machine learning, conceptual work does not disappear; it remains at the core of the formation of the “ground-truth” that supports model training. Yet, this work is rendered largely invisible (Star and Strauss, 1999): first, because it is distributed among numerous actors across different stages of the AI arc of work (“The distributed work of AI” section); and second, due to its concentration at both upstream and downstream the algorithmic production process (“Conceptual void?…” section).
The distributed work of AI
The conceptualization of knowledge required to build machine learning ground-truths is marked by its fragmentation. While Forsythe observed that conceptual work in the 1980s was concentrated in the hands of engineers—who were responsible for interviewing experts, translating their knowledge into algorithmic rules, and operationalizing expert systems—modern AI development reveals a significant division of labor in modeling tasks. These tasks are distributed across various stages of the AI arc of work, where they are handled by heterogeneous groups of actors, each contributing in different ways in designing the “artificial world” of AI. Ground-truths result from distributed cognitive processes (Hutchins, 1996), involving both a variety of actors along the production chain and the artifacts they rely on, such as reports, guidelines, software interfaces, and legal decisions. They reflect a gradually constructed compromise, shaped through successive decisions and adjustments made throughout the arc of work. In both cases examined here, magistrates, data scientists, and annotators each play a role in transforming judicial decisions into AI training data, drawing on their unique expertise, understanding of the project, and personal values—elements that vary significantly along the AI production chain. This is reflected, for example, in this excerpt from an interview, where an annotator of project B explains that she sometimes makes choices that differ from the given instructions in order to better protect the individuals mentioned in the decisions: So, they told us not to annotate the dates. I didn’t really get it, I think it's because the machine already recognizes them pretty well or something. But, I don’t know, I feel like it might be important for the person if we let their birthdate slip by? I’m not sure, but to be safe, I sometimes still annotate it anyway. (Interview with project B annotator, April 2021)
The distributed nature of conceptual work in AI carries significant consequences, as it reconfigures and obscures accountability for the construction of ground-truths. Fragmentation blurs the hierarchical structures initially indented to guide the project and its objectives, as appears in the cases studied in this article. In both cases, significant efforts were made to legitimize the AI's classification system from a juridical point of view: since both projects were led by judicial institutions, legal expertise was prioritized in shaping the algorithmic taxonomy. In case A, the use of a well-known nomenclature, employed for many years by magistrates within the world of justice, reflects the managers’ desire to base the project on solid legal grounds. Similarly, in case B, the creation of a taxonomy was entrusted to a selective group of magistrates renowned for their legal expertise. These choices underscore a clear political aim of anchoring the projects in an unequivocal legal authority, which was also reflected in the hierarchical composition of the teams, both led by magistrates. Conversely, the other members of the team, and especially annotators, are positioned at the lower end of the professional hierarchy. Whether are interns (in case A) or lower-tier administrative workers (in case B), annotators are perceived as mere executors, expected to apply the predefined taxonomy strictly according to instructions, without letting their personal values or interpretations influence their work. This perception can be reinforced when annotation work is outsourced, sometimes conducted remotely or even internationally. Yet, the ethnographic observations of annotation work conducted in this study reveal that this portrayal is far from accurate. The unpredictable nature of the data frequently compels annotators to make crucial decisions, influenced by their social background and their understanding of their role. For example, several of them, due to their previous experience in court registry services, possess a clear representation of the lives of individuals sentenced in judgments. As a result, they tend to annotate prison addresses as if they were residential addresses, even though this is not specified in the annotation guidelines. The manner in which annotation is conducted—along with the creation of training datasets and the correction of algorithmic outputs—contributes just as significantly to the final results as the initial goal setting and categorization processes, despite the pronounced hierarchical imbalances in the organization of algorithmic production.
As a result, the distribution of AI's conceptual work along a long arc of work makes it difficult to identify potential points of failure or assign responsibility when the system malfunctions. Ground-truths emerge from the layering and intertwining of decisions made by actors who are often far removed from one another along the production chain. Once the dataset is established, the black box of AI is closed and it becomes difficult to determine whether the model's performance issues stem from category definitions, data selection, or annotation choices. All activities along the AI workflow are consolidated into a singular output which once produced offers little room for questioning or critique—much like quantified objects. Tracing and identifying bottlenecks require detailed, step-by-step reviews of the production chain through targeted evaluations and audits (Christin, 2020)—a process that is both complex and time-consuming.
Conceptual void? The displacement of conceptual work in the invisible side of AI
The conceptual work underpinning machine learning is not only fragmented but also displaced away from the core of algorithmic information processing. In traditional rule-based algorithms, conceptual work is directly embedded into computer codes, in the form of formal rules (Alcaras and Larribeau, 2022; Marino, 2006). For example, for the automatic anonymization of surnames in a document, a rule might state: “If the preceding term in the sentence is ‘Mister’ then annotate the following term as ‘surname.’” In such cases, the code directly embodies the underlying conceptualization, objectives, and worldviews, which led to Lawrence Lessig's (2000) famously stating: “code is law.”.
With the advent of machine learning, however, this conceptual dimension disappears from the algorithmic lines of code. The models themselves are context-agnostic tools, a characteristic that derives directly from their modes of creation: machine learning models are mostly developed and released by major tech companies, independently from the diverse organizations that will use them (Prietel and Raible, 2024). In case B, for instance, data scientists reused pre-existing natural language processing models developed by Facebook and Zalando. These models are “classifiers,” which organize the words of any textual dataset into a vector space, based on their relative proximity. This vectorial space, referred to as an “artificial world” (Girard-Chanudet, 2023), allows the model to probabilistically process new data by comparing its similarity to the training data. Machine learning models are mathematical tools; much like calculators, they perform probabilistic operations independently of the nature of the provided inputs. Thus, unlike rule-based algorithm programmers, data scientists are not involved in conceptual work. Their expertise is primarily centered on selecting the most appropriate models and making the necessary adjustments to optimize their outcomes, focusing mainly on statistical and mathematical skills. This technical setup creates an impression of a “conceptual void” around AI, as it gives the illusion that models process raw information, from which they seemingly identify patterns and almost magically produce results through statistical operations.
Yet, as this article shows, the ethnographic observation of AI chains of production reveal that conceptual work is still very much present in machine learning. The training of algorithmic models—used as premade tools by the data scientists in both case studies—represents only a minor fraction of the labor involved the process: AI work is primarily data work, rather than algorithmic work. In machine learning, conceptual work is displaced from code writing to the construction of ground-truth data. This shift relocates the conceptual labor outside the direct algorithmic chain, into what I term the “invisible side of AI”, where it is handled by actors without traditional algorithmic expertise, such as judges or annotators.
This reconfiguration, along with the fragmentation of conceptual work, plays a central role in rendering such work invisible (Star and Strauss, 1999; Suchman, 1995)—a process illustrated in the following Figure 2.

Rules versus machine learning: the displacement of work.
Conclusion: Deconstructing the myth of AI autonomy
This article challenges the idea that machine learning algorithms are linked to a decline in human conceptual labor. It questions the paradigm shift that these techniques are thought to have induced relatively to the symbolic algorithms developed in the latter half of the twentieth century. Drawing on a comprehensive qualitative methodology, including ethnographic observation of the complete production processes of two AI tools developed within the French justice system, this study demonstrates that AI is fundamentally dependent on the work of multiple actors engaged in the shaping of algorithmic operations. It proposes to view AI design as an “arc of work” around which diverse and heterogeneous groups of actors—including data scientists, engineers, annotators, and magistrates—gather in varying configurations and timelines. Five key stages of this work arc are identified: (1) goal setting, (2) databasing, (3) taxonomy construction, (4) labeling, and (5) monitoring.
The article's findings highlight three key insights. First, conceptualization remains central to the development of machine learning algorithms, permeating every stage of the production chain, from goal setting to the oversight of algorithmic models. AI workers continually frame the information embedded in training data, ensuring that the standardized processing methods characteristic of algorithms can accommodate the fluid and multifaceted reality mirrored by the data. The functioning of AI relies on conceptual articulation work (Suchman, 1996), which gets embodied in the ground-truths. The way this work is anticipated, particularly during the initial framing of a project, greatly influences its overall success. For instance, the failure of Project A, which was abandoned 2 years after its launch, can be understood in light of the insufficient human resources allocated to data handling and the lack of foresight regarding the need for conceptual framing of algorithmic models. In contrast, the success of Project B, whose algorithm continues to anonymize public court decisions in France, owes much to the team of annotator-supervisors recruited and retained by the supreme court. However, despite its centrality, this conceptual labor remains largely invisible, in part because of the strong imaginaries of autonomy often associated with AI systems (Cave et al., 2020). Two interrelated mechanisms contribute to this obscuration.
On the one hand, AI's conceptual work is distributed across a wide array of actors along the arc of work. The algorithmic production process is highly fragmented, with many tasks often outsourced or even offshored by producing organizations. For instance, the annotation of training data—where much of the conceptual burden lies—is frequently performed far from decision-making centers, leading to significant devaluation (Le Ludec et al., 2023). This fragmentation complicates the identification of accountability, as ground-truths emerge from iterative exchanges, negotiations, and collective decisions made throughout the production chain.
On the other hand, compared to traditional rule-based algorithms, machine learning shifts conceptual work both upstream (to the phase of datasets building) and downstream (to the phase of model monitoring). Conceptualization is no longer tied to the process of algorithmic coding itself but is displaced to the phases of data work leading to the construction of training sets. This shift plays a major role in concealing the ongoing conceptual work. In machine learning, “code is law” becomes “ground-truth is law”, with significant consequences for rendering actors accountable for their choices; and for the methodological framework necessary to study algorithmic design processes. This article proposes to continue the unpacking of the algorithmic black box, by developing processual ethnographic methodologies aimed at following complete AI arcs of work. Such methods allow to investigate the gradual conceptualization work involved in AI production, by taking into account the multiple stages, tasks, and actors involved in the algorithmic factory. This investigative and analytical framework could be effectively extended to contexts beyond the justice system, especially in configurations where part of the tasks are outsourced, offering valuable insights into how conceptualization is integrated into machine learning systems and enhancing the understanding of their operational dynamics.
Footnotes
Acknowledgements
I would like to thank the International Publication Support Program at CEMS (EHESS), and in particular Véra Ehrenstein, Claude Rosenthal, and Laura Centemeri, for their insightful feedback and proofreading. I am also grateful to Nicolas Dodier and Valérie Beaudouin for their valuable support and guidance throughout this research. Finally, I express my sincere appreciation to the AI Department of the Court of Cassation and the Ministry of Justice for granting access to the fieldwork.
Funding
The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by Paris Region Ile de France through the Paris Région PhD2 program(grant number Paris Région PhD2 2019).
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
