Abstract
Artificial intelligence advancements have reignited job displacement debates that focus on how the use of artificial intelligence affects labour, without considering how the production of this technology influences labour division. The generalisation of machine learning has created an increased demand for outsourced data workers. Outsourcing companies and crowdwork platforms are both used to generate, annotate, and enrich data. This data tasks are performed by workers from low-income countries, who often earn poverty wages. As with traditional outsourcing, workers must integrate complex multinational subcontracting networks. In this article, we examine how France outsources artificial intelligence-related tasks to workers in the African island nation of Madagascar. For our study, we interviewed 26 data workers, eight employees of French start-ups, and conducted secondary research on two artificial intelligence systems – a canteen checkout terminal and an algorithm to detect shoplifters in stores. The data collected allowed us to reconstruct an end-to-end artificial intelligence production value chain, revealing the need for data
Introduction
Recent years have seen a proliferation of commercial artificial intelligence (AI) products, from voice assistants to generative AI. This growth has reignited an ongoing debate about job displacement. An egregious example of this is the March 2023 OpenAI study on generative pre-trained transformer (GPT) models (Eloundou et al., 2023), according to which 80% of the workforce will be affected, including high-skilled positions. The study’s methods and many limitations have been compared to a controversial 2013 publication by two Oxford researchers who predicted 47% job losses due to AI by 2030 (2017). Both publications adopt a reductionist approach that breaks down human work into a series of tasks. They then evaluate how many of those tasks could be performed by AI.
Over the last decades, both this approach and more general theoretical analyses about the ‘end of work’ (Rifkin, 1995) seem to manifest cyclically, coinciding with waves of technological innovation. These debates share one common flaw. While they focus on the impact on labour of commercial AI adoption and usage, they overlook the way AI production affects labour division.
A new demand for outsourced data work has arisen due to machine learning generalisation. In addition to traditional outsourcing companies, crowdwork platforms are also used to generate, annotate, and enrich data. For a few cents, AI producers can hire and dismiss hundreds of thousands of workers via services such as Amazon Mechanical Turk. Australia’s Appen offers more stable piecework, providing short gigs for over 10 million workers. In most cases, this ‘microwork’ is performed by workers from low-income countries, who earn poverty wages.
From Amazon to OpenAI, from start-ups to big tech corporations, AI producers outsource data work. In an article published by TIME magazine a few months after ChatGPT launched, it was revealed that Kenyan workers were paid less than USD 2 per hour to train the chatbot (Perrigo, 2023). In other documents, the U.S. company claimed it contracted workers in the Philippines, Latin America, and the Middle East (Stiennon et al., 2022).
The article argues that the true impact of AI on labor is not selecting talent or replacing formally employed workers, but fostering an increasingly precarious globalised workforce of underpaid pieceworkers. Our goal is to explore the perspectives of workers from both sides of the production chain and organisational aspects of data work. An end-to-end analysis of AI production chains reveals several parallels with traditional outsourcing, as AI fosters a redistribution of service labour to consumers and workers in the Global South.
Literature review
An increasingly significant part of the literature on digital labour studies has focused on the emergence of
Regardless of how it is procured, data work is essential to the production and maintenance of commercial AI-powered services. Commercial AI builds on scientific research into AI, but is primarily focused on finding applications in business and industry for various purposes, including automation, prediction, and decision-making. According to Tubaro et al. (2020), there are three contributions microworkers make to commercial AI systems: by generating and annotating data (
A global offshoring context can re-inscribe data-labour externalisation practices as strategic decisions. In economic geography, literature focusing on microwork spatial distribution highlights how AI companies in the global North outsource data work to Global South workers due to lower labour costs and advantageous labour laws (Graham and Anwar, 2019). This article examines how data work is outsourced in a broad sense by analysing every data work process in the AI value chain. This approach also allows us to consider the end-product effects of the externalisation structure. Along the production chain, we can also consider labour conditions and extractive practices.
Industrial companies have been outsourcing an increasing part of their activities since the 1980s. Using examples from the French nuclear and automotive industries, Appay (1998) links the dynamic with lean production. In these decades, mass layoffs occurred as a result of these novel ways of organising labour. This was not due to a decrease in labour demand, but rather to an increase in the externalisation of tasks. In Appay’s view, this phenomenon is characterised by a cascading externalisation chain: contractors located far away from the hiring firm are the most locally embedded and employ the most precarious workers. Due to this complex structure, outsourcers face high coordination costs, as one contractor defaulting may cause major disruptions. Hiring organisations therefore tend to control contractors’ production processes directly and tightly to mitigate those risks. The production chain structure shifts are dynamic, however, and are affected by several factors, including Information and communication technologies (ICT) innovation.
A study of externalised contract manufacturers by López et al. (2021) shows that digitisation plays a role in many aspects of logistical and production chains. A complex network of locally embedded subcontractors produces, moves, stores and ships garments. Companies can develop diverse fashion lines and scale up and down production quickly by outsourcing garment design and production to this network of small to mid-sized contractors. Digitisation of bulk of inventory management systems partially offsets the high organisational costs of distributed production processes. Digital devices, such as RFID, have also transformed and ‘taylorised’ service jobs in retail and logistics. Information and communication technologies have offset some of the costs induced by labour-intensive activities in the service sector, according to Esparza (1992). Unlike agriculture and manufacturing, the service sector has not experienced the progressive replacement of labour with capital. Due to a more linear relationship between customer acquisition and labour requirement, most service providers observe low productivity. The use of contingent workers is an efficient way to increase revenue in this context. Contingent workers are more likely to belong to vulnerable groups (minorities, women, and elderly persons) (Noyelle, 1990).
Thus, parallels can be drawn between the patterns of externalisation described above in manufacturing industry sectors, and the externalisation of data-work by AI firms. Traditional manufacturing (as well as agricultural work and mineral resources) intersects with the distribution, circulation, and consumption of different kinds of information. The connection illustrates what Fuchs (2016) describes as an ‘international division of digital labour’. In order to drive wages down and increase profits for tech companies, data production is increasingly fragmented and distributed across the globe. This creates an asymmetrical geography, where dependencies and imbalances in power and wealth reproduce historical colonial patterns. Although in the present article, we will not address directly the topic of data colonialism (Couldry and Mejias, 2019) and digital coloniality (2017), this notion represents an undercurrent of our work and informs our focus on the role of human labour in AI production across its entire value chain, as well as the role of ICTs in sustaining and enabling the externalisation of data-labour by AI companies. AI therefore becomes a central component of the performance of various workers, displacing it both spatially and chronologically.
As shown above, the literature on data work has mainly focused on disclosing the human labour embedded in the AI production process. Several authors have shown that human workers are essential for producing algorithms, and have studied their roles and working conditions. Here, we propose a different perspective on data work to deepen the analysis. Data workers – and AI systems – should be reintegrated into the production chains of the goods and services they craft. In this way, we can not only ask who performs data work, but also study AI as part of a broader global labour market transformation. The question then becomes: how is AI used in production?
Methods
The fieldwork results presented here are part of a larger research project dedicated to the outsourcing of the production and annotation of data needed for AI in European, an more specifically French start-ups. As part of this investigation, we studied the entire AI production chain from a more in-depth perspective, focusing on four key actors: employees using AI-enabled devices at work, AI companies, annotation companies, and their employees. 1
Here we present two case studies: the first one focuses on an automatic canteen cash register produced by the French start-up Automatik 2 for CanteenCorp, while the second focuses on an image recognition system developed by AIView for detecting in-store shoplifting (all names have been changed for confidentiality reasons). For those case studies, 34 remote interviews were conducted between February and December 2021 with annotators, annotation-managers and start-up employees, as well as secondary data sources drawn from the communication of the two start-ups studied (corporate websites, social media presence, etc.).
In those two case studies, we are able to account for a wide range of annotation types used by AI start-ups, while maintaining similarities to allow comparisons to be made. The start-ups use the same contractor, employ about 70 people in France, but one develops a specialised product, while the other sells trained models to third parties. Fieldwork was conducted in several phases. First, we contacted Automatik and interviewed five employees: two co-founders, one ‘annotation manager’, one engineer, and one solution architect.
We also were granted access to a Slack server used to coordinate labour between Automatik and their Malagasy contractor. When doubts arise about their tasks, annotators use this channel to ask for clarification. Interview topics included data-work outsourcing decisions, the way data work is organised, and how clients and subcontractors interact. This start-up connected us with subcontractors who hired data workers. Located in Madagascar’s capital Tananarivo, DataMada employs approximately 100 people, mainly for two French clients. We then set up a mixed-method survey combining a questionnaire and semi-structured interviews. Qualitative data from this fieldwork is used in this article. We interviewed 21 data workers for the AIView canteen solution and seven for the Automatik shoplifter detection project between September 2021 and January 2022. Interview topics included their professional background, annotation activities, and interactions with French AI start-ups.
We also studied AI models’ end-users. In particular, we interviewed Automatik’s solution architect, who handles start-up customers’ feedback and difficulties regarding the scan-and-go checkout solution. In our other case study, we interviewed the manager of a store that uses AIView. The purpose of our study is to explore how cashiers and security agents in French stores use the system and how it may affect their work. Additionally, we analyse secondary data sources, such as company websites.
Human labor behind AI-based services: Insights from two case studies
We studied two specific AI applications deployed by several French companies in different industries. As a starting point, we present both case studies through the lens of the human work that goes into creating and maintaining AI-based services.
CanteenCorp’s automatic checkout machine
With subsidiaries in 14 countries and more than 13,000 employees worldwide, CanteenCorp is an international industrial catering company. Check-out process efficiencies, and the cost of running a canteen are among their concerns. The company is experimenting with automated payment terminals that are expected to handle food detection, billing and payment processes with minimal human intervention. Using a computer vision algorithm, this terminal will identify food on trays and sort it. This algorithm was designed and trained by the French start-up Automatik.
Human input is supposedly only required when a machine vision model is unable to come to a reliable conclusion. First, the system involves the consumer, and if that fails, a restaurant employee, to scan the food and tag it according to predefined categories.
Even though the computer vision model is designed in Paris by a French start-up, the data annotation needed to build the ground truth datasets necessary to train the model is delegated to a contractor in Madagascar. Malagasy data labellers receive unsorted pictures from the international catering company and they segment the images using the French start-up’s platform. They identify different kinds of food on plates, and sort them according to a complex label structure.
According to the start-up’s Chief Executive Officer (CEO), digital work platforms such as Amazon Mechanical Turk differ from the Business Process Outsourcing (BPO) structure in many ways. For the development of automatic products, he prefers traditional outsourcing, mainly because it allows the existence of strong ties between the French start-up and the Malagasy annotators:
A strict division of labour governs the dataset production process at the French start-up. First, solution architects identify customer needs. A first annotation guide is then drafted with the help of data scientists, describing the relevant labels and categories. Madagascar’s annotation team leaders receive the guide, discuss its implementation, and train annotators. The annotation of the dataset can then begin. An extensive feedback loop has been observed, following a strict hierarchy on the part of the Malagasy contractor. Expert annotators or team leaders receive comments from annotators about labels and their suitability. Afterwards, the information is transmitted to the start-up’s solution architect. There follows a new cycle of adaptation of the guide and labels.
For now, the solution architects make a document detailing the instructions. The leader reads the instructions and when everything is clear, he explains it to the annotators who will have a training document to accompany them during the processing. When they encounter a difficult case, they consult the leaders. If [leaders] can’t answer, they ask us. Our solution architect and I have a Slack channel where we answer questions very matter-of-factly. One or two hours after the annotators start processing, the leaders start the production check. Interview – Dataset manager, Automatik
Dataset production is both dynamic and cyclical, particularly for labels which are necessary for a standardised classification to be used when training a computer vision algorithm (Jaton, 2017). Newly designed dishes, cutlery, plates and other items introduced in restaurants can’t be recognised by the computer without recently annotated data. Other issues arise as a result of consumer behaviour. For instance, while deploying the system, the start-up noticed that consumers were putting their phones on trays. Since this might have been detrimental to the privacy of end users, the French start-up requested that annotators identify consumer phones on the trays in order to blur them.
Additionally, quality control of the dataset is distributed along the production chain. Because the performance of the model is heavily dependent on the quality of the dataset, labelled images are manually verified by the Malagasy contractor and then technically audited by the French AI startup. Project details and deadlines heavily influence the verification process. While some AI projects require real-time data classification, others delegate some of the verification process to their users-in the case of this check-out system, as we have seen, canteen employees and consumers are the ultimate judges. In the end, the question arises: Who catches the algorithm when it slips? What is the importance of preventing false positives and ensuring that true positives are not missed? In many ways, the answer to these questions determines how the production chain will be shaped. Automatik relies on the consumer to ensure that the algorithm works properly:
Automatik customers interact with our system. In the case of disagreement [with the algorithm’s output], they can correct it. – Automatik solution architect.
AI companies require stable partnerships with professional annotators due to the importance of dataset quality and the circular nature of the data annotation process.
Often, the initial brief isn’t very good. There is always something missing. It never covers edgecases. So we update the documentation, we give it back to the annotation team, and ask them to redo the work to increase data quality. In time, we get to know our annotators better, and they tell us how to improve our tools because they know them well. – Automatik CEO
In addition, due to the low cost of labour provided by the Malagasy contractor (data annotation accounts for between 1% and 5% of AI project costs), each actor in the chain is strongly discouraged from trying to optimise it, as a result of which the quality of the dataset may suffer:
The truth is, it’s a low-value task. Once we have a system that works, I don’t want to endlessly second-guess myself about it. The real issue concerning the quality of the dataset concerns the diversity of the pictures we feed it. Once [annotation] is solved, there’s no point in talking about it forever. – Automatik CEO
Nevertheless, three main points emerge from this case study. First and foremost, the annotation process requires constant updating and amendments. The second point is the international division of labour between the French AI engineers and managers, and the Malagasy subcontractor. Lastly, this framework assesses the performance of AI systems by using consumers and canteen employees as last resort options.
AIView’s theft-detection intelligent cameras
AIView is a French AI start-up that launched in 2018. In 2022, it employed about 50 people. The company adds a computer vision algorithm to already-installed store surveillance cameras to detect thefts automatically in an effort to prevent shoplifting. Using the service, security guards, cashiers, or store managers are supposed to be able to receive automatic alerts on their phones. According to AIView’s social media communication, they supply almost 5000 cameras to over 100 stores in France, Australia, Belgium, and the UK.
Clients including supermarkets and drugstores contract the start-up to automate video surveillance. On the company’s website, several phases are described: the algorithm is first connected to surveillance cameras, and tests are conducted; afterwards, the algorithm can detect
There is a preparation phase that must be completed before Malagasy workers can begin annotating data. Both start-up and supermarket employees are involved in it. As explained by the store manager during an interview, first AIView technicians visit the store to observe how it operates. Using their computer systems, they also analyse surveillance camera recordings in order to identify thefts and design labels for annotators:
When training is completed, data labelling starts. Data workers’ tasks are much broader than simply annotating videos, we discovered after several interviews. They receive video sequences lasting between 6 and 8 seconds over a digital platform. Sometimes this footage shows people stealing, damaging, or unpacking products, although of course the majority of shoppers act in more innocuous ways. As one data worker puts it:
This information led us to the conclusion that what the annotators are viewing
There are, however, some videos that are not processed in real time:
‘The alerts are real time videos and the analysis consists of looking for thefts and labelling them, if it’s a theft or if there’s nothing to report. There are other tasks in our team too (…), including the annotations [of videos] that we accumulate on a server. When we label these videos we evaluate our performance. – Data annotator
So when they are not annotating livestreamed images, data workers label non-live videos that are stored on a server for asynchronous processing. Once annotators have finished labelling a batch of these videos, they must also review other workers’ annotations by double-checking them. These activities are consistent with the other two types of data work listed by Tubaro et al. (2020), namely
Annotators must also adapt to the technological advancements of the model when producing this AI service. That also entails adapting to the preferences of the final clients, as well as incorporating into their labels unusual types of theft or suspicious behaviour. In the following excerpt, HR managers of the Malagasy annotation company describe how they identify and handle non-anticipated theft gestures. These complex cases are discussed first internally, either among annotators or among annotators and team leaders in Madagascar. In the words of a data annotator:
We also communicate with the client directly (…). I would say every fortnight, because (…), there are always updates on the [annotation] platform. On our side we do our best to participate, and not only as executors. When it comes to label names, there are shoppers (…) who hide the products in their clothes, so there is no label for them.
Yes, that’s right. Shoplifting techniques evolve indeed. We try to detect them, and then report them. The client then updates the [annotation platform] and creates labels for them. – Data annotation HR managers
As with the Automatik checkout system, the success of the model depends on the quality of the service. How effective is the interaction with the
As a result, the AI system links data workers’ decisions with supermarket staff’s actions. As explained by a data annotator, decisions taken by Malagasy data workers affect the behaviour of department store workers in other countries:
The relationship between AI and labour: Towards several displacements of labour
Through the examination of these case studies, we have explored the role of human labour in AI systems. What can we learn from them about AI work and its effect on work? This section presents three findings from our fieldwork showing that, rather than eliminating work, AI merely distances it.
Integrating data workers to monitor datasets quality issues
In our fieldwork, we found that data production differs substantially from what the international literature describes as platformised micro-work. Possible explanations for this could be either the persistence of traditional forms of firm-based outsourcing, or the evolution of the AI industry towards longer-term commercial projects, as argued by some of the persons interviewed in our case studies.
In these two cases, subcontracting through complex networks of local intermediaries is more common than using self-service internet portals like Amazon Mechanical Turk to find workers. We interviewed workers who admitted to using ‘platforms’, but they mostly meant online software provided by their BPO employers, rather than full-fledged marketplaces for crowdwork. The French start-ups we examined externalise this activity, but not to a crowd of anonymous users on an online platform. Their direct relationship with suppliers enables them to control workers almost as if they were their own employees – in spite of the fact that they are not required to provide comparable salaries, work stability, and benefits.
According to our interviewees, using remote crowdwork to produce data is fundamentally different from working with long-term local subcontractors. Interviews with Automatik officials reveal that direct subcontractors provide better quality datasets than platform workers. According to our two case studies, annotators must develop expertise in specific tasks. Managers and startup employees support and train workers both when they perform core functions (such as detecting theft or distinguishing different dishes on a tray) and when handling annotation tools. This, in turn, affects the flow of information and the shape of the production chains in the organisations studied.
Honestly, we looked at Amazon and their system, but we quickly abandoned it. From an annotation standpoint, the quality is rotten. We wanted people to develop some kind of expertise. If random people do this for a small amount of side money, the quality [of the work] will most likely be affected. If you mess up 100 percent of your annotations, it’s pointless. – Automatik CTO
The quality of data annotations is therefore the main criterion for discarding commercial micro-work platforms. The BPO model is seen as a better way to integrate workers into the production process. The development of worker ‘expertise’ and the dynamic nature of annotation guides and labels are essential for long-term AI projects. The skill building process at Automatik is formalised with the creation of ‘expert annotators’ who specialise in identifying one type of dish (starters, main dishes, etc.).
And when I compare it to MTurk, etc., the quality [is better]…Knowledge and expertise are still building as we move along, and [the data workers] are doing assignments tailored to the typology of our customers and what we offer them. We can get real product feedback. – Automatik CEO
We found that workers communicate regularly with startup employees in order to achieve this level of expertise. Slack conversations, as well as telephone meetings, are used to communicate with them about annotation guides. Startup employees and team leaders discuss doubts and common mistakes in annotation. In addition, they are systematically given initial training on projects and paired with more experienced annotators until they become comfortable with the job.
It is essential to contextualise these results, as both models are interrelated. With crowdsourcing platforms, labour is allocated through APIs, according to price mechanisms and gamification elements such as badges and scores. The majority of the control is allegedly performed by algorithmic management, so human management is not specifically required. BPOs, on the other hand, handle communication with clients and the briefing of instructions through personal interaction. This implies creating management posts such as team leaders and experts. The enriched data produced by both platform and BPO workers is part of a pipeline that is ultimately controlled by the AI startups, and both systems rely on software-as-a-service to produce the annotations. Moreover even on crowdsourcing platform, recent literature underline the spread of direct management and worker-verification tools (Hauser et al., 2022), challenging the argument that work is allocated and managed purely through API-like means.
The distinction between BPO and platform is therefore ambiguous. Both of them originated from earlier organisations, like call centres and teleservices, to disaggregate information-intensive functions (Apte and Mason, 1995). Compared to pre-internet modes of outsourcing, both of them are technology-driven (Borman, 2006) and partake in the imaginary vision of establishing and running ‘virtual business units’ in which most or all of the functions are outsourced to remote services (Motahari-Nezhad and Stephenson, 2009). Both models are seen as viable options for supporting outsourcing and minimising costs through global labour arbitrage (Borman, 2006). In short, today’s interconnected data services, whether they are BPOs or ‘pure platforms’, are forming flexible processing units (Person et al., 2018).
In light of this continuity between the two models, what factors influence the decision to adopt one over the other? Despite the inability to have a definitive answer to this question, some assumptions can be made. There may be signs of maturation in the rapid development of the AI sector. AI companies are therefore moving from R&D to production. As a result, the organisation of the data production structure needs to change.
A reliable and stable supply chain is essential for companies developing ‘last-mile automation’ products. Some side projects could function with datasets annotated by an anonymous crowd, whereas specific, long-term commercial products require a stable, trained workforce for maintenance, verification, and feature extension. A long-term relationship with the annotators is maintained through this structure, which ensures continuity and quality of service.
Inside data work: The distributed cognitive work of AI workers
Data workers are integrated according to specific business imperatives. As a corollary, data work involves many actors at different levels of the chain, who will each contribute to AI. As shown in the following figure, this process is distributed among AI companies, annotation companies, clients, and consumers. The arrows represent the interaction between workers involved in AI work.
Each project requires annotation work from several types of employees at the annotation company in Madagascar. At one end of the data supply chain, ordinary annotators use basic tags (e.g. ‘theft’ and ‘dish’) to roughly outline elements in pictures and videos. After that, quality teams composed of Malagasy verifiers sometimes known as ‘expert agents’ review the annotations. Expert annotators examine hard-to-interpret images, and some specialise in particular types of objects or situations (e.g. certain dishes for food projects and certain types of shoplifting in surveillance projects). Cross-team communication is conducted via an online commercial platform, Slack, which serves as a remote workspace. Within teams, communication between annotators, experts and management (team leaders and production officers) occurs locally and in-person.
The French part of the value chain develops annotation production processes. Both AI startups we studied have their own proprietary annotation online software that allows them to monitor annotators’ performance and quality. A ‘dataset manager’ oversees annotation quality and communicates with annotators about their input. As a second category of French employees, ‘solution architects’ are responsible for designing models as well as acting as the interface between the final client, the AI startup and the annotation company. They are tasked with writing annotation guides, in cooperation with the dataset manager.
They therefore bring customers’ needs to bear in terms of AI models and annotations. In this way, the final client may provide feedback on their needs or corrections to the model. For instance, store managers may not want textile bags considered suspicious for automatic theft detection. Also involved in AIView’s case are supermarket employees who receive alerts and validate them by detaining potential shoplifters.
Finally, there is a category of ‘workers’ who are unaware of their involvement: consumers (Dujarier, 2016). In the case of AIView, they pass in front of cameras that record theft videos, from which a model will be trained. In CanteenCorp’s case, the model only guesses objects in a tray, so their impact is much greater: consumers have to validate the dish that matches their tray based on three choices the AI system provides in a certain percentage of cases.
Based on Tubaro et al. (2020)’s data work typology, we can review how data workers contribute to AI. By describing the various tasks behind this data and AI work, we complement this typology. The examples cited above clearly illustrate how many different tasks are involved in AI work and how many different workers participate. The tasks are diverse and often combine data classification (‘what do we see in the data?’) with model problematisation (‘what problem are we trying to resolve?’). When it comes to data annotation, work can be informal (a gut feeling about the manner in which certain objects are identified) or, on the contrary, quite formal: compiling and updating annotation guides, detecting data anomalies, translating instructions in foreign languages, refining parameters, interacting with clients, validating output.
A key finding from our studies is that data classification practices are embedded within several stages of micro and macro decisions that ultimately shape the service and its reality. The production of AI, we argue, involves multiple workers at various levels of the value chain who ultimately define and redefine how the datasets are structured and how machine learning models work. Our observation is that there are very porous boundaries between modelling (the
AI act as a workforce coordination mechanism through a wider digital infrastructure
Another aspect of our work has been figuring out what AI means for coordination and monitoring. AI-based services require wider technological infrastructures as well as subjective judgement in producing and refining information.
To produce AI enabled services, the supply chain infrastructure ties together subcontractors, employees, clients, and customers. Inputs (such as surveillance videos and images of trays) are collected directly from client tools. Subcontractors send alerts directly to supermarket cashiers and security guards when workers detect theft. AI start-up employees monitor data annotation labour and communicate with subcontractors through chat systems: Are there any labels missing from the image of a canteen tray? What makes a certain gesture suspicious in a surveillance video? How should data workers annotate free bread? These discussions involve examining the annotation process, a task that annotators perform as well. On the end-user side, digital infrastructures are cameras and canteen terminals that will eventually collect their inputs. In-store cameras and canteen terminals also serve as a verification system where final consumers may validate the results of the model. Aside from that, AI systems are also benefited by discussions between Madagascar workers, start-up employees, and clients’ managers. As in the shopping bag example, many customers at a supermarket were detected as ‘suspicious’, resulting in several false positives.
In this sense, ‘AI-as-a-service’ (Newlands, 2021) can be construed as a workforce coordination mechanism that collects multiple human inputs to work and rework them via multiple human feedback loops. In other words, its role is to organise the contributions of humans that will enable it to operate effectively. Artificial intelligence is orchestrating several displacements of labour, in line with digitisation and cascading externalisation trends.
AI systems organise displacements and modification of labour
Both cases discussed in this article involve low-paid workers from Madagascar as well as end customers participating for free in AI’s productive activities. AI can take advantage of this low-cost workforce through digitisation to achieve its automation goals. Due to the multiple levels of outsourcing involved in AI production, it creates ‘cascading externalisation chains’ similar to those observed in manufacturing. So AI doesn’t primarily destroy jobs, but re-distributes them to poorly paid workers and unpaid consumers. We estimate that salaries in Madagascar annotation companies range from $50 for a basic data worker to $200 for a manager for a forty-hour-week, or from $0.41 to $1.7 per hour.
AI is evidently contributing to several displacements of labour. The first category of labour displacement involves surplus labour performed by consumers and sometimes for workers, as part of a larger trend of outsourcing activities to unpaid customers. Another displacement comes from the fact that core parts of the AI enabled services production chains are outsourced to Madagascar. Subcontractors in the Global South replace employees within organisations.
Traditional jobs in client companies are also changing due to AI systems. Supermarket cashiers are now required to respond to alerts when data workers in Madagascar detect something suspicious. In addition to working as cashiers, canteen workers have the responsibility of checking if plates are properly positioned to be detected by the checkout terminal. According to Automatik’s solution architect, their job description will eventually become ‘hospitality workers’, tasked with welcoming patrons and serving meals.
Developing and maintaining AI systems involves several teams, organisations, and constituencies across several countries, which illustrates how complex automation work can be. Rather than the ‘end of work’ that several theorists predicted, we are experiencing its systematic multiplication and displacement.
Discussion
This article documented human work hidden behind AI’s ‘layers of knowledge’. Beyond automation, what does this technology mean for human work? How does an organisation implement AI-based systems?
We discussed the AI production process from the perspective of the work required to implement automation. This study offers insight into the work organization behind AI development, despite its limited scope. Furthermore, it highlights the importance of production processes and the need to focus on firms rather than only conducting research at the macro level. In this way, we can better understand what AI means organisationally, namely in terms of work reorganisation. Building and maintaining AI-based services requires data classification and model problematisation tasks. At the end of the day, we show that AI, and its core tasks necessary to its functioning, are driving a more general trend towards services outsourcing.
In contrast to the existing literature on platformed digital labour, we demonstrate that data workers are more deeply embedded in AI companies’ operations (Gray and Suri, 2019). Consequently, data workers are often invisible to end users, but much less so to companies that produce models (Irani, 2013). So, regardless of how data work is outsourced, the claim that certain services are automatic is the most powerful vector of invisibility. This lack of visibility also affects annotators’ self-worth and perception of themselves. Despite their expertise, data workers see their activity as ‘menial’ compared to highly paid and visible AI workers such as data scientists and software engineers. This ultimately hurts the bottom line of Malagasy workers regardless of their integration into French startups’ operations. Thus, our estimates of their pay (from $0.41 to $1.7 per hour) are significantly lower than Amazon Mechanical Turk’s estimated earnings of $3.13 per hour on average (Hara et al., 2018). By outsourcing data work in this way, AI firms benefit from highly skilled workers. Using AI companies’ control and discussion tools and by integrating with a working group in Madagascar (managers and colleagues), workers contribute to the production of AI-based services, and develop a project-oriented expertise over several months. Rather than eliminating work, AI relies on outsourcing and digitisation to ‘remotise’ it. In that regard, our findings support the notion of ‘complementary work’ that is necessary to supplement AI and which is never completely eliminated (Shestakofsky, 2017).
As regards the organisational aspect of data work, Lehdonvirta et al. (2019) suggested that DLPs are fundamentally different from BPOs. The former allow client companies to distribute projects divided into smaller tasks to a vast number of independent work providers, while the latter allow clients to outsource whole projects. Our findings suggest the existence of a continuum ranging from ‘pure’ DLPs to BPOs. According to recent advances in the literature, hybrid organisations exist; they present themselves as DLPs, but integrate levels of management and supervisory oversight as in BPOs and conventional companies. ‘Deep labour’ is what such hybrid organisations provide, according to Tubaro (2021). In both case studies, we presented evidence that workers in Madagascar are often directly supervised by employees of French AI companies, even though data work is formally externalised. There is no definitive explanation for why firms choose one form of outsourcing over another, but several empirical clues suggest it may be related to project complexity and timeline. It is essential to have a stable, trained, and supervised workforce if annotation quality is critical, if the algorithm requires frequent data updates, or if employees interact with the system in real time, as was the case with the ‘smart’ surveillance camera solution we discussed above. Any way you look at it, data workers are fully embedded in the AI-enabled service or entire production chain, no matter the organisational form used to externalise them.
While data workers’ expertise isn’t always recognised by AI developers, our research shows that, in contexts of AI applied to commercial products, companies value the ability of workers to understand the application domain. In discussing the loss of jobs associated with AI, it is also important to keep in mind that data work is integral to the production process.
Our position is in agreement with that of the authors who stipulate that the impact of AI on jobs cannot be assessed by the likelihood that a specific set of tasks will be automated (Autor and Salomons, 2018). Automation through AI is based on complex assemblages of models and distantiated labour. We discussed two cases of augmented services where data workers participated directly in AI solutions production. In the second case, they go so far as to replace the algorithm.
Recent literature analyses this issue from the perspective of potential AI effects on job quality. According to Benanav (2020), automation in the manufacturing sector and AI in the service sector have not significantly improved productivity. The labour market is not affected by technological innovation, but by an overproduction crisis. Therefore, the question is not whether work has a future, but what will it look like? The author argues that we are more likely to see the generalisation of precarious work and underemployment than the disappearance of all jobs. Micro-level studies should be conducted to analyse the transformation of work following the implementation of AI systems.
Based on our reconstruction of the production chain, we conclude that what’s ultimately sold and deployed as an AI solution is a production process itself, not a finished algorithm. ‘AI-as-a-process’ is a sociotechnical device that automates last-mile tasks. As part of a global trend toward transfer learning and large pre-trained models, French AI companies employ open-source and big-tech development frameworks to customise ready-to-use models. In this context, data-related tasks are fundamental to AI work, and they extend beyond just annotating data. In Madagascar, they also include writing annotation instructions, reviewing models’ behaviour and discussing it with clients. In France, employees of client companies are required to adopt expanded roles as data curators, as well as to supervise the capture of unremunerated consumer inputs. Consumers, data workers, client employees, and AI startup officers share the cognitive workload of looking at the data (
We view AI as a logistic tool that facilitates and coordinates various work processes related to these two functions. We contribute to the literature in Science and Technology Studies that attempts to understand the human activity behind digital infrastructures (Muniesa, 2003). This work is also in line with the work of Jaton (2017) on AI-labs as well as his definition of modelling as a process of problematisation. Commercial AI challenges are constantly (re)negotiated by various categories of workers. In summary, AI systems encompass a range of dimensions related to their organisation, social, and technical structure (Dagiral, 2012), which ultimately ‘does more than make work easier, faster or more efficient; it changes the very nature of what we mean by work’ (Bowker and Star, 2000).
Providing AI-enriched services is labour-intensive, so involving AI in the production process does not mean automating, but allowing labour outsourcing at both ends of the value chain. Our argument therefore is that commercial AI does not automate service jobs but reproduces and enables labour displacements, lengthening externalisation chains. The study concludes that AI production is largely driven by economic arbitrage driven by wage minimisation. This is also consistent with the seminal argument by MacKenzie and Wajcman (1999) who states that ‘much innovation is sponsored and justified on the grounds that it saves labour costs’.
AI is woven into a long-standing dynamic of job displacement. This is not a result of robots replacing humans, but of lowest-cost workforce being selected to produce AI solutions. Companies in the North outsource standardised tasks to subcontractors in the South, while maintaining strong control over the production processes at work. Our case studies show that AI is used to transfer service-related tasks from in-house employees to outsourced vendors. Similar dynamics are put in place by companies for manufactured goods. The distancing of the workforce allows for reduced labour costs in sectors where local wages are traditionally high (Esparza, 1992). As a result, volume growth and production costs no longer grow linearly. In this context, automation should be viewed as a form of labour externalisation.
Conclusion
We believe that ethical considerations of AI should begin with working conditions, as data workers play an increasingly significant role in the technology sector. Working conditions and the organisation of the production chain should also be considered along with the devices’ purposes. The results are consistent with computer science research on the archaeology of large datasets, which has found numerous errors in ‘canonical’ AI datasets and the potential impacts on algorithmic bias (Paullada et al., 2020).
Building AI models is a dynamic process involving many actors in the chain. In this context, we have emphasised the importance of integrating data workers into artificial intelligence production. Because of the cyclical nature of model production, they also contribute to model problematisation by taking responsibility for a key element of model production, namely the annotation of the dataset. Through this loop of human work, tasks previously performed locally by workers can now be performed remotely. That is regarded as automation, especially in the service sector.
One limitation of this article is that we did not detail annotator’s working conditions, nor the post-colonial aspect of the relationship between Malagasy contractors and French AI companies. As those aspects are related to the structure of the production chains as well, they should lead to further publications.
Moreover, we find evidence of a strengthening of the professional identities of data workers. Although they are prevented to aspire to the role of AI experts, they are aware of the importance of their specialised knowledge in particular fields, such as food or security. Further research is needed to develop this trend.
Our results can also be used to facilitate AI regulation by providing insights into how models are produced and the international division of data labour. When evaluating the effects of AI on work, public policies must consider the entire production chain, absent which it would be impossible to observe the productive activities necessary to develop AI. A further question arises as to whether AI biases can be lawfully regulated, particularly in light of EU legislation such as GDPR and the AI Act. 3 Because it requires AI system producers to disclose the processes behind their solutions, we need to ensure that data annotation practices are documented.
AI companies are selling automation promises that can be boiled down to business decisions and managerial choices that involve precarious labour at every level. A fair algorithm can only be developed if its production process is controlled. Therefore, a key element of AI regulation should be ensuring the responsibility of principal companies towards their subcontractors. This will ensure that global chains in the future have a more effective way of taking AI into account.

Data work production chain. Note (from bottom to top): In a subcontracting firm in Madagascar (‘Data company’), data workers annotate, enrich, and tag videos and pictures with the help of local expert agents and managers. The annotated data are then sent to a French ‘AI company’, whose dataset managers and solution architects integrate them into their systems. Afterwards, the systems are sold to a ‘Client company’, whose managers, employees, and customers fine-tune them to meet their needs. Companies carry out
AI preparation,
AI verification,
AI impersonation (white symbols signify active and grey symbols inactive).
Footnotes
Acknowledgements
The authors warmly thank all the workers who answered our questions, as well as Julie who opened her company to us. We also thank Paola Tubaro, as well as all the members of the DIPLab research project for their valuable comments on this article.
Declaration of conflicting interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: An anonymous start-up provided an additional source of funding to allow access to their workforce.
Copyright statement
Please be aware that the use of this LATE X2
Copyright
Copyright © 2023 SAGE Publications Ltd, 1 Oliver’s Yard, 55 City Road, London, EC1Y 1SP, UK. All rights reserved.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This article presents the results of the project HUSH ‘The Human Supply Chain Behind Smart Technologies’ funded by the French National Research Agency (Project ANR-19-CE10-0012). The authors of the present article have been involved in the project as doctoral researchers (authors 1 and 2) and principal investigator (author 3). This research also benefited from an additional funding provided by a French AI start-up in order to pay interviewees.
