Abstract
Public organisations, like the Netherlands Police, increasingly rely on data. Despite its importance, there has been limited empirical attention to how data is created and how its situated context impacts algorithmic interpretation of data. Using the Netherlands Police as a research focus, this study aims to fill this gap by studying datafication from an interpretivist perspective, emphasising the importance of material factors and human actors engaging in ‘data work’. The main research question is: How do material factors and human actors interact in ‘data work’ to enable datafication of street-level situated contexts at the Netherlands Police? The study builds on nearly 200 h of ethnographic fieldwork with street-level employees at the Netherlands Police. It finds that data work is deeply embedded in daily policing and is shaped by personal values, organisational context, and practical considerations. The findings highlight challenges posed by structured and unstructured data entry in the registration software for police reports. Structured data limit discretion with predefined labels, often conflicting with employees’ own perceptions. Unstructured data offer more flexibility but pose challenges such as linguistic nuance, inconsistencies, and the presence of ‘voice’ in police reports that complicate data interpretation. The study unearths various patterns of the interplay between human actors and material factors in relation to their situated contexts, which impact how police reports are constructed through a registration system. This approach, emphasising the interplay between material factors, human actors, organisational dynamics, and contextual factors, can help public sector organisations take steps towards responsible algorithmic interpretation of data.
Introduction
Public organisations are increasingly dependent on algorithmic systems (Broomfield and Reutter, 2022; Kuziemski and Misuraca, 2020). Data can be considered the lifeblood of algorithmic systems; datasets are searched, queried, linked, analysed, and used to train machine learning algorithms. Data are critical to the functioning of AI systems, for instance, their performance, fairness, safety, and scalability. In short, algorithmic systems cannot exist without data (Muller et al., 2019; Sambasivan et al., 2021).
Whilst the importance of data in the public sector is widely acknowledged, limited empirical attention has been paid to how data comes into existence – how it is collected and processed – and how practices of datafication may affect data and the later interpretation thereof for algorithmic applications. Using the Netherlands Police as a research focus, the current study aims to fill this gap by studying datafication from an interpretivist perspective, emphasising the importance of both material factors and human actors engaging in ‘data work’. The study seeks to answer: How do material factors and human actors interact in ‘data work’ to enable datafication of street-level situated contexts at the Netherlands Police?
Data are constructed, not found (Gitelman, 2013; Kitchin and Lauriault, 2014); as Mejias and Couldry note, ‘the term “datafication” implies that something is made into data’ (Mejias and Couldry, 2019: 1). In this study, the ‘something’ concerns the street-level cognitions at the Netherlands Police. These cognitions are the result of the interplay between street-level police employees’ observations of a situated context and their (often tacit) knowledge and decision-making in that context. Cognitions are subsequently turned into ‘data’ in the form of police reports (Waardenburg, 2021) and processed for algorithmic applications. Whilst algorithmic application may take various forms, much academic attention has been paid data in predictive policing algorithms, with multiple authors pointing out historical data may reproduce biases and inequality, and poor data quality may cause serious harm (Bennett Moses and Chan, 2018; Brayne, 2020; Brayne and Christin, 2021; Busuioc, 2021; Dencik et al., 2019; Kitchin and Lauriault, 2014; Kuziemski and Misuraca, 2020; Meijer and Wessels, 2019; Perry, 2013). The Netherlands Police is a particularly interesting site to study due to their investment in data science. Street-level police reports as a data source and are used, for example, in the Dutch predictive policing software (Oosterloo et al., 2018; van Schie, 2022).
The current research is based on an interpretative design and ethnomethodological approach informed by numerous informal interviews and desk research. The main empirical findings are grounded in nearly 200 h of observations of different types of street-level police employees as they went about their daily work, much of which is data work.
Perspectives on data
To gain a deeper understanding of data work, I approach the concept of data from an interpretivist perspective (Cunliffe, 2011; Geertz, 2000; Kunda, 1995; Morgan and Smircich, 2021; Schwartz-Shea and Yanow, 2012). This perspective pays close attention to the active role human actors play in datafication and those who apply this perspective have pointed to human agency, for example, in selecting relevant data and deciding on how data are depicted and rearranged. Human agency is also involved in the construction of fair and unbiased datasets (Vydra and Klievink, 2019). Critical data scholars argue that data are not ‘discovered’ in the world, but rather captured or curated. Some academics even refer to ‘capta’ instead of ‘data’ to emphasise the human agency in the creation or construction of datasets (Drucker, 2011; Kitchin, 2022; Masson, 2017; Muller et al., 2019).
From the interpretivist perspective, data are created through such social interactions and hold no meaning without attention to its situated context (Bates et al., 2016; boyd and Crawford, 2012; Kitchin, 2022; Kitchin and Lauriault, 2014; Loukissas, 2019; Muller et al., 2019; Waardenburg, 2021). Some critical data scholars stress the mundane and material qualities of data construction (Bates et al., 2016; Pink et al., 2017). In this view, data are co-constructed by material factors (e.g. the hardware and software being used, how information is shared or stored and spatiality of workspaces) and human actors engaging in ‘data work’ in a situated context. This view aligns with the insight from sociomateriality research that the social and material are inseparable, neither taking precedence over the other (Introna, 2016; Orlikowski, 2007). Data work involves assemblages of human actors with material factors and requires physical, emotional and cognitive efforts (Bates et al., 2016; Pink et al., 2017; Thylstrup et al., 2022; Waardenburg, 2021).
This research builds on earlier literature on data work, for example, in healthcare (Møller et al., 2020; Sambasivan et al., 2021). In government organisations, the shift from traditional street-level bureaucracies to organisations which function more at screen- or system-level has introduced ‘data professionals’, with expertise in data and algorithmic processing (van Eck et al., 2018; Zouridis et al., 2020). However, much of the work constructing data, is left to street-level employees documenting their activities (Møller et al., 2020; Sambasivan et al., 2021; van Eck et al., 2018; Zouridis et al., 2020). Since the introduction of data-driven policing in 2008, data work has become deeply intertwined with the street-level police employees’ daily work (Chan, 2001; O’Connor et al., 2022; Terpstra et al., 2019; Waardenburg, 2021). Waardenburg observed data work comprises roughly one-third of emergency response officers’ activities in the Netherlands (Waardenburg, 2021). Detectives and I&S employees spend even more time on it.
This article is positioned in critical data and algorithm studies paying due attention to data as a result of specific context-dependent activities (Gerlitz, 2017; Meijer et al., 2019; Meijer and Grimmelikhuijsen, 2021; Stahl and Wright, 2018; Wieringa, 2020; Wirtz et al., 2022). Whilst previous studies have acknowledged the importance of data and algorithms’ dependency on it, its impacts are often not studied directly. This research thus adds to existing literature by offering empirical evidence of situated data work at the Netherlands Police, providing a categorisation that contributes to critical analyses of the ways in which data are created. This research further contributes to critical discussions on the role of data and technology in public sector organisations and offers new insights for the discourse on responsible introduction, use, and implementation of algorithmic systems in the public sector, particularly the police.
Beyond these academic contributions, the study offers practical value to the Netherlands Police and similar organisations working with ‘ready-to-use’ data. Providing insight into the actual practices underlying the creation of these data, this research can provide data professionals and managers at these organisations with a better understanding of the data they work with and how they might affect their processing and interpretation. Such understanding can help prevent potential issues during algorithmic processing that are not commonly anticipated by data professionals not involved in data creation.
Methods
To answer the main research question How do material factors and human actors interact in ‘data work’ to enable datafication of street-level situated contexts at the Netherlands Police? from an interpretivist and sociomaterial constructivist perspective, I conducted ethnographic fieldwork at the Netherlands Police. In this section I elaborate on the research site and fieldwork.
Research site
The ‘data’ I discuss here are digital data constructed by street-level police employees through the act of filling in forms in the ‘BVH’ registration software (Dutch ‘basisvoorziening handhaving’). BVH functions as the main registration tool for street-level police cognitions in the Netherlands. It is created, managed, and used by the Netherlands Police. This software is used for filing different types of reports, for example, incident and crime reports, witness statements, interrogations, and internal memos. BVH is one of the largest data sources available to the Netherlands police. Whilst – to the best of my knowledge – BVH is filled and used only by police employees, portions of its data are linked to other (police) systems. BVH-data have been used for analyses and algorithmic systems such as the Dutch predictive policing system (Waardenburg, 2021), and intelligence decisions, and may be used by non-police stakeholders (Sanhaji, 2022).
Within BVH, a distinction can be made between structured and unstructured data. Simply put, structured data are neatly organised and categorised in a predefined format. Examples within the BVH are incident codes and drop-down menus with a finite number of predefined options for the user to choose from. Structured data are commonly used to achieve standardised input. In contrast, unstructured data lack such predefined formatting. Unstructured data in BVH are registered in open text fields, where the officer adds information without constraint, for example, their written account of a situation. Historically, unstructured data are difficult to process algorithmically, but technological advancements in natural language processing make this increasingly viable. Most police reports in the Netherlands combine both types of data. Photos and other types of visual data are not uploaded within the BVH but in a connected system, outside the scope of this research.
The BVH-system runs only on police hardware and can be accessed through integrated car systems and officers’ work phones. Whilst preliminary information for a registration is sometimes entered in the system through these means, most of the extensive data work is done at computers in local police stations. The BVH software, including its forms and open text fields, is the main material factor in this study. Though it is important to note that other material factors, such as the hardware BVH is accessed from, also play a role in police data work.
To gain a deeper understanding of BVH, I conducted an ethnographic fieldwork, observing street-level police employees who frequently work with this system, joining them for entire shifts. Street-level police employees are the main human actors in this study. Specifically, I observed emergency response officers, detectives, and employees involved with intake & service (I&S). Emergency response officers use BVH to register emergency situations assigned to them. They often write internal reports, providing details and context about their actions to inform colleagues. They also use the system for criminal persecution, for example, to write crime reports, witness statements, and conduct preliminary interrogations. Emergency response officers often have specialised assignments, for example, in dedicated neighbourhoods or as digital experts. Whilst emergency calls take precedence, officers focus on their specialties during downtime when on standby for emergencies. This happened during some observations.
Detective work is divided by the seriousness of the reported crime. Small crimes such as theft, simple misdemeanour battery or drug sales are registered in BVH, whilst more extreme cases (e.g. murders) are processed in other software, outside the scope of this research. Detectives use BVH to report interrogations, witness statements or citizen incident reports. BVH is also consulted as an information source on cases and previous actions.
I&S employees mostly handle citizen requests, such as citizens reporting incidents or crimes (e.g. theft or vandalism) or requesting information. In contrast to emergency response officers, I&S employees have scheduled time for report writing. They also spend much time at police station front desks. Like detectives, they consult the BVH-system as a source of information. Due to limited police training, I&S employees are not allowed to make certain decisions or for example, interrogate suspects. Consequently, the data they register in BVH are less varied than that of other police employees.
Fieldwork
Ethnography is a common method in the fields of human–technology interaction and critical data studies (Button and Harper, 1996; Kitchin, 2017; Pink et al., 2017). It allows for in-depth study of ‘data work’ practices. My fieldwork was conducted in various teams within one regional unit. My intentions, affiliation, and research interest were clearly stated when interacting with police employees. Adopting a sociomaterial constructivist perspective, I did not attempt to distance myself. Limiting my focus to one regional unit allowed me to familiarise myself with those contexts and become a familiar face to the people working there. I established good working relationships with the people I observed – whom I came to regard as colleagues rather than participants. My integration in these police departments encouraged openness and sharing more information than during one-time visits. One time I was jokingly ‘crowned’ the ‘queen of BVH’ by a police employee. My observations, thus, cannot be regarded as separate from me as a researcher. Instead of pursuing objectivity, I continuously reflected on my roles as colleague and researcher as well as my role in prompting certain data for example, by asking questions about BVH.
During the fieldwork, I made short jottings on my phone, which I later developed into extensive narrative fieldnotes (Emerson et al., 2011). These fieldnotes are the core research data used in the study. 1 Thick descriptions were added on everything related to BVH, as well as technology in general (Geertz, 2017). I was also interested in conversations about contextual factors which might influence data, such as fatigue or distraction. Technology, BVH, and contextual factors can be considered as sensitising concepts (Bowen, 2006) and were often the focus of my questions. It should be noted that I did not take a structured approach to in-the-field interviewing; most conversations were informal and related directly to events that took place at that time. During shifts I was typically linked to one police employee, although others were often present, especially in emergency response shifts which are typically conducted in officer pairs, and feature in my fieldnotes. Shifts at the Netherlands Police typically last for about 9 h, although circumstances may cause the officer to leave earlier or later. In total, my observations comprise nearly 200 h (see Figure 1).

Spread of observations.
Throughout the research, preliminary findings were often discussed and interwoven with ongoing fieldwork, whilst enhanced interactions and discussions during my observations allowed for more depth in my research. This also allowed me to work on the argument during the research process. Taking into account this headwork, deskwork, and fieldwork (Van Hulst et al., 2016; Van Maanen, 2011), I conducted one round of simultaneous open and axial coding in NVIVO. During the first round of coding, an initial inventory of factors, which impact data work, emerged. A second round of more selective coding was conducted on paper, resulting in the division between data work practices (structured and unstructured data) and situated datafication (organisational and human factors). All research data were pseudonymised; names used in this article are fictional. Approval to conduct observations was granted by the Ethical Committee of Utrecht University (FETC-REBO ‘Value-sensitive algorithmisation in the Netherlands Police’). Research data were stored securely in a university system.
Findings
In this section the two core aspects of the research question ‘How do material factors and human actors interact in ‘data work’ to enable datafication of street-level situated contexts at the Netherlands Police?’ will be discussed. First, I address the practices of data work, then I sketch the situated contexts of this data work.
Data work practices
For the first portion of this research, I turn to practices of data work at the Netherlands Police. Specifically, I study how street-level police employees engage in data co-construction practices with the BVH-system and how this impacts the created data. Police reports are typically a combination of structured and unstructured data. As these two types of data have different underlying assumptions, require different practices of data work, and result in distinctive challenges for the data, I discuss each separately in this section.
Structured data
Structured data are characterised by categorisation and standardisation. A police employee inputting structured data in BVH faces predefined options. The underlying assumption relies on material factors: whilst human actors may make mistakes, the materiality of menus and lists with predefined categories is expected to achieve a standardised input.
When starting on a new report in BVH, an interactive form is shown. Here, the police employee must fill in predefined categories and fields. The first field is the incident code. Incident codes (Dutch: Maatschappelijke Klassen) were introduced in the late 1980s and have increased in number since. At the time of observation, there were 685 incident codes. Some describe an incident or event (e.g. specific traffic violations and accidents, different kinds of theft, disturbances or ‘other incidents’), whilst others describe activities and actions (e.g. use of coercive means by the officer) (Sanhaji, 2022). Finding the ‘correct’ incident code is not evident. Despite the standardisation efforts, officers have significant discretion when choosing codes.
During an emergency response shift, Mick 2 shows me the prior registrations in BVH of a woman he is writing a report about. It's a long list, and I noticed there are different types of registration connected to her. Sometimes the incident code is ‘disruption homeless person’ (Dutch: ‘overlast zwerver’), which Mick has chosen today as well. There are also many entries coded ‘disruption confused person’ (Dutch: ‘overlast verward persoon’). When I ask Mick, he tells me the categorisation depends on how an officer feels at the time. Today, the woman was quite confused, but Mick's prior knowledge that she is homeless informed his choice. He says he could just as well have chosen ‘confused person’ as an incident code. In this case, it seems there is no ‘correct’ incident code, and multiple options could apply. Rather than material factors, it is the human actor who decides which code is used.
The incident code determines which other forms become available; e.g. sometimes a witness report form might be necessary, but not all incidents involve witnesses. The incident code thus defines what other data an officer can and must enter and which roles are available for the citizens involved.
During another observation, I joined Stefan to a call concerning a man who threw a bottle at a house. We couldn’t find the man, and a few hours later Stefan is entering the incident in BVH. He wants to use the incident code ‘wantonness’ (Dutch: ‘baldadigheid’), which he deems the most appropriate. That code, however, defines the man as ‘suspect’, which Stefan finds unfair, as there was no arrest or warning. He therefore chooses the less accurate code ‘argument/conflict’ (Dutch: ‘ ruzie/twist’), which allows the more neutral ‘involved’ role. In this case, Stefan prioritised what he regarded as the correct role over the correct incident code. This connection in the software between role and incident code frustrates many officers, pushing them to workarounds. As summarised by detective Eelco, this connectedness impacts the data directly as either the role or incident code is inaccurate. ‘In either case, the system causes faulty data, and that fault ends up in an analysis’.
Some of the fields in the BVH contain drop-down menus for further specification. Such specifications are not always clear.
Robin and I are sitting at his computer, having just returned from the scene of a theft. BVH asks Robin to select a store type from a drop-down menu. The theft occurred at a wholesaler for hairdressers. With some doubt, Robin selects ‘wholesaler’. He continues the registration, but when he wants to save it, a pop-up appears; the store is recognised by BVH as a ‘jewellery store’. I am surprised, it was clearly not a jewellery store. Robin seems unphased. He explains that the store must have been entered in BVH before, by a colleague who also doubted the type. The system recognises the store and regards Robin's choice as ‘wrong’. As Robin cannot change this, the wholesaler is added as a jewellery store again.
The rigidity of the BVH is a material factor resulting in a mismatch between the registered data and Robin's human interpretation. Drop-down specifications can require specialised knowledge, and police officers often complain there are too many options. They are frustrated at the amount of time – a limited resource – it costs to find specifications. They do not acknowledge the necessity or usefulness of these specifications.
One particularly telling example occurred when I joined Sandra, who works at intake and service. Sandra was processing a citizen incident report. A suitcase filled with camera equipment was stolen. The crime took place in a different country, and the police will not act upon it, the report is merely a requirement for the insurance company. In case of a theft, BVH requires the police employee to specify the stolen goods. Sandra has received a list of the stolen camera equipment by e-mail and is about to start entering it in BVH. ‘Yeah… what is that?’ she wonders, as she reads the technical name of the first product. ‘What are we going to register this as?’ She takes out her work phone to look up the product on Google images, but that offers no clarification. Sandra types ‘photo’ in the search field of BVH.
I am surprised at the number of options that pop up. I see different types of battery packs, flash devices, measurement devices and more. Some are even further specified, e.g., a distinction is made between a ‘prism viewer’ and an ‘infrared viewer’, and different kinds of lenses; macro, telephoto, standard and wide angle. Making the ‘right’ decision is no easy feat. After a while, Sandra finds the option ‘DK-16: photo other’. She is relieved. ‘Everything into “other”. I’m not going to try and figure all that out on Google, am I?’ she says, and she proceeds to choose the ‘other’ category for each of the goods. Even then, it is a lot of work. There are 20 goods in total, and the entire affair takes over half an hour. Whilst the options in the drop-down menu are intended to gather specified information about the theft, they have the opposite effect; the sheer number of options BVH offers, and the specialised knowledge required tempt Sandra into using the least specific data option available; ‘other’.
The human actor and material factors thus create less specified data than if Sandra had been offered fewer options. Whilst she did attempt to specify at first, she was quickly discouraged to continue this effort.
In sum, the structured fields in the BVH-system and the infrastructure of its software shape police employees’ decision-making and limit their discretion, as officers try to fit situations into predefined labels. However, police employees’ more nuanced understanding of incidents also impacts their data work. The interplay between this human agency and the material properties of the BVH software presents challenges to structured data and the interpretation thereof. The rigidity of the system can lead to data not conforming to employees’ views, and police employees may find workarounds if the predefined labels do not fit their cognitions. Police employees often resort to selecting more generalised options when presented with an overwhelming number of choices, leading to less specific data. Structured data in BVH are thus a result of the mutual shaping by human actors and material factors.
Unstructured data
Unlike structured data, unstructured data lack predefined formatting. The underlying assumption is that not everything can be standardised. BVH includes open text fields, where officers manually input information without the material constraints of structured data. This includes written portions of citizen incident reports, video descriptions, witness reports, or internal memos. Although sometimes officers rely on paper notes to remember events or details, this is increasingly replaced by preliminary on-site report-writing on mobile phones, all of which constitute material factors. Note-taking in emergency situations is often rushed, whilst employees in non-emergency situations may have time for more extensive note-taking. Quite often, no notes are taken at all, and officers rely entirely on memory.
International research found that the accuracy and completeness of police note-taking and report writing are essential to juridical prosecution of criminal cases (Güss et al., 2020; Vredeveldt et al., 2018). Increasingly, reports are also used for algorithmic applications. However, note-taking and report-writing are no easy tasks, whilst police training on these tasks is limited (Byrman and Byrman, 2018; Gregory et al., 2011; Güss et al., 2020; O’Connor et al., 2022; Vredeveldt et al., 2018; Waardenburg, 2021). Previous research pointed out the prevalence of errors where details are omitted by officer discretion – for example, the officer did not deem a particular detail relevant – or by accident – for example, forgotten details (Gregory et al., 2011; Güss et al., 2020; O’Connor et al., 2022). Omission of data was common during my observations as well. In some cases, this was a conscious decision in favour of the persons involved. Details that may make a person or case seem more negative, such as aggressive behaviour without injury, were often omitted. The following observation illustrates this.
Enrico conducted a citizen incident with a man who had recently been attacked. The victim made statements like ‘I saw it in his eyes, this is not the first time he did something like this, he was really angry’ and ‘he is only a few steps away from TBS’ (TBS refers to court ordered treatment). Although Enrico tried to make the statement as close to what the man was saying as possible, he left out these negative and unproven statements about the attacker 3 .
Omission of data could also comprise errors in judgement or forgetfulness, where material factors play a greater role. This became most clear when I joined detectives, as they often cope with missing information.
Astrid had to work a case in which three people were arrested. However, no descriptions of their appearances were added by the emergency response officer who had performed the arrest a day earlier. ‘Sometimes, when you are working in emergency response, you think a story makes sense’, she explains, ‘but without prior knowledge about the case you notice information is missing. I also used to make those mistakes in emergency response, of course’, she adds. Next to us, a colleague is frustrated, a report states a suspect voluntarily gave up the access code to their phone, but the code itself was not written down. The colleague now has to track down the officer who wrote the initial report, and hope they remember the code.
Further, I regularly noticed inconsistencies or linguistic mistakes in the actual writing of texts. Officers make spelling errors and typos, sometimes caused by material factors such as faulty hardware. Whilst most officers re-read their texts after finishing them, fixing many of these mistakes, some remained in police reports. In some cases, small linguistic errors changed meanings or nuances in a report, for example, the switching of two letters changed an address into another – existing – address. One particularly interesting case concerns a citizen incident report about a threat.
Sitting at the kitchen table in his apartment the victim told officer David and myself that ‘he [the suspect] said he was a little psychotic’. David writes the incident report in his paper notebook and reads it to the victim after he is finished. The victim agrees and signs the notebook. We leave for the office, where David directly registers the incident in BVH. I notice that David writes that ‘the suspect said he was in a psychosis’, which turns the ‘soft’ statement supposedly made by the suspect and retold by the victim, into a much ‘harder’ and factual statement.
These findings on linguistic nuances and small textual and linguistic inconsistencies are in line with earlier research (Byrman and Byrman, 2018).
Next to errors and discretionary decisions, these texts have what I term ‘voice’. With ‘voice’, I refer to the presence or personality visible within the text. Texts of citizen incident reports and witness reports, for example, are mostly written from the first perspective (‘I’). This signals that the words are the citizens’ own words. However, it is the police officer asking specific questions and entering the responses into the system, thereby adding their own voice. To enable prosecution, officers are expected to write statements from a direct sensory perspective, using sentences such as ‘I saw x’, or ‘I heard person a say y’, or ‘I felt z’. This is confusing, as the ‘I’ in these statements refers to the citizen, whilst, as Robin explains, ‘nobody talks like that’.
In these two types of reports, I realised there was no standard for what should be quoted directly versus what could be paraphrased by the officer. This seemed left to officer discretion and was likely decided on without much consideration (c.f. Byrman and Byrman, 2018). The wording in these reports seems to be a co-construction of police officer and citizen, governed by the workings of the Dutch juridical system as well as the ‘rules’ of BVH. However, the citizen is expected to sign their statement after reading or hearing it, confirming the words are their own. The matter of ‘voice’ is mainly a matter of human agency, conforming with the underlying assumption of unstructured data. However, the act of signing the report to change the official ‘owner’ of the voice is a material one, as is the fact that it is the officer writing the report, not the citizen on site or on the other side of the desk.
Another type of police report where ‘voice’ might have an impact is suspect interrogations. All interrogation reports I witnessed were conducted in a question–answer interview format. The officer wrote down questions and answers given by the suspect in BVH. Sometimes officers edited the text for readability and coherence (e.g. ‘fixing’ grammar) whilst keeping true to the content of what is said.
When working a theft case Luisa reads an interrogation report written by a colleague. In this report, answers seem to be written down verbatim, including language typical for certain minority groups (e.g., Dutch: ‘je weet toch’, a phrase commonly used by youth with migration backgrounds). Later on in the interrogation we read that the suspect feels the police ‘always fucks me over’ and he calls them something akin to ‘fucking police’ (Dutch: ‘kankerpolitie’). This language, paired with the materiality of the BVH-system which allowed Luisa to read the report before meeting the suspect herself, affected Luisa. She expresses low expectations about the suspect and his cooperation in her investigation. Expectations which were wrong, as the suspect was cooperative.
Such language ending up in an algorithmic analysis may also function as a hidden proxy for ethnic minorities, even when other measures are taken to prevent biases, for example, prohibiting the use of ethnicity and common proxies as indicators. Finally, in some cases, there is a clear police voice present. This was more common in reports meant for internal use rather than in other types of reports. Police voice can entail use of police-specific abbreviations, metaphors or imagery, that is, non-literal language that is known to be difficult to analyse algorithmically. It can also include statements that could be considered questionable.
In one internal memo, for instance, I read that the officers that were on-site ‘clearly saw’ that the involved persons belonged to gypsy families. No clues were given about how they came to that conclusion, or what made them so certain. When I asked Sjoerd about it, he explained that, as these notes were mostly read by police colleagues, it was deemed less important to be formal.
Once again, the materiality of how these memos are shared through the software impacts human decisions on how to register data, which may later be used for algorithmic analyses.
In conclusion, the analysis underscores the role of the human actors in shaping the narrative content of unstructured data. Police employees’ decisions reflect legal requirements, institutional norms, and their personal discretion. Material factors play a role too. Note-taking and report-writing are complex and multifaceted activities. This difficulty, paired with the extensive freedom offered by unstructured data and issues with recall, for example, due to the lack of time and prevalence of interruptions previously discussed, impact datafication of street-level cognitions. The omission of data as well as linguistic mistakes, nuances, and inconsistencies, were common. Finally, the presence of ‘voice’ within reports introduces a new layer of interpretation and poses challenges for for example, bias, as it is not always clear whose voice is reflected in a text. Paired with the inconsistencies in reporting practices such as verbatim representation of speech, use of metaphors and police-specific language, unstructured data are very complex to interpret and process algorithmically.
Situated datafication
In this section, I aim to gain a deeper understanding of the situated context of data work. Whilst police employees acknowledge the importance of data work, which takes up much of their time, they care little for the algorithmic processing of these data. They are unaware of how the data they construct are used for algorithmic models or analyses, which corroborates earlier research findings (Madan and Ashok, 2023; Veale et al., 2018). The datafication of their street-level cognitions is governed by the situated context. Organisational and human dynamics govern discretionary decisions on what street-level cognitions will be datafied and how this datafication happens.
The organisational
As data work is deeply embedded within the work practices of street-level police employees, the organisational context has much influence. In my research I uncovered four main organisational context dynamics that impact data work and datafication at the Netherlands Police. The first concerns the dynamic nature of policing; data work is often interrupted. Consider emergency response officers who must drop everything to rush to a high priority incident. This can happen at any time, and thus often interrupt data work, conversations, or lunch. Emergencies take precedence. Regular office dynamics also interrupt data work, for example, when a colleague needs attention or help. For front-desk employees, interruption can also be caused by citizen visits and for example, package deliveries.
This was the case when I spent a short while at the front desk with Sandra. Within the short timeframe of 88 min, a total of 8 people came to the desk interrupting her data work. Sandra told me this was a common occurrence, to the extent that she felt she couldn’t perform any ‘concentrated’ or ‘high effort’ tasks such as citizen incident reports or describing video material while seated at the front desk.
The spatiality of this situation is an important material factor. The fact that Sandra was sitting at the front desk, which is publicly accessible and placed centrally in the room, invites visits and interruptions. Interruptions such as these negatively affect police employees’ memory, especially if much time had passed between an incident and writing the report – a common occurrence. In fact, on more than one occasion, I found myself struggling with my position as a researcher versus a helpful colleague. Due to my extensive note-taking, I had a good recollection of events and details, such as exact times. Officers often turned to me if they had trouble recollecting. Had I not been there, situations would have been reported less exactly or omitted altogether. The dynamic nature of policing thus impacts data accuracy and completeness.
The second dynamic is the strain police of work – including data work. Data work can be bodily straining, particularly for emergency response officers who work in shifts. Waardenburg (2021) notes that desk work is found to be physically straining, as officers are trained for physically exerting work rather than desk work (Waardenburg, 2021). Officers have told me shift work affects their biorhythm and negatively impacts their memory and concentration, which in turn will affect their data work. Their bodies, then, also comprise a material factor. Even with working fewer shifts than the officer, I experienced similar strain and felt more tired during my period of fieldwork than usual. Particularly during night shifts my own data quality and attentiveness took a dip. As Joey noted during a particularly boring night shift ‘I always think “great, I can do my administration at the end of these shifts”, but then I don’t manage to get a letter on paper’.
Police work can also be mentally straining. Although many have told me they have a ‘changed worldview’ where they become insensitive to disaster, at times a specific event does touch an officer. This was the case during the following observation:
A 16-year-old girl went missing during my shift with Sebastiaan, and we went to the parents’ house to collect more information for the search. The incident seemed serious. Luckily, the girl came home on her own accord. However, the incident impacted Sebastiaan and his more junior partner. We discussed it in the car ride away from the parents’ house. This was the first time they felt a missing persons case could have ended very badly. It had worried them, whilst they had tried to keep calm towards the parents.
Emotionally challenging situations such as these make it tough for an officer to focus at times, as research has previously noted stress affects memory and police report-writing (Beehr et al., 2004).
Third, the organisational culture and rhetoric surrounding technologies may impact datafication, primarily facilitated through technological means. I observed a dominant negative technological narrative. Employees mentioned their limited technological competence and organisational techno-optimism. These conflicting narratives hinder technology adoption. Most police employees I observed lack confidence in their technological competence and technologies in general. These findings align with literature reporting that police officers have limited data literacy, and data and technology receive limited attention in training (O’Connor et al., 2022; Sambasivan et al., 2021; Sanhaji, 2022; Waardenburg, 2021). Their perceived incompetence is augmented by a perceived organisational techno-optimism (Stol and Strikwerda, 2017). Officers often complain about the organisation ‘dumping’ new technology on them without offering support. Officers have little faith in the technologies they must work with, nor do they trust their own competence to work with them. These frustrations impact the data construction. Officers may refuse to adopt new technologies, sticking to old data-work practices, or pick ‘easier’ options where available. Finally, although I have not observed this, previous research indicates officers might try to avoid data work altogether (Waardenburg, 2021).
Limited resources form the final organisational dynamic. The Netherlands Police suffers from serious undercapacity. One chief mentioned a shortage of 23 FTE (full-time equivalent) that summer, including holidays. There is often simply too much to do in a very limited amount of time. Data work costs time and suffers under high pressure to be efficient and completed quickly. This functions as a catalyst for the other factors I have discussed. There is little time or patience for new innovative technologies or extra work when something goes wrong, interruptions are also more of a problem when time is limited. Next to strain, an officer may feel a need to rush data work in favour of other responsibilities.
In sum, as data work is deeply interwoven with the lived reality of these street-level police employees, it is affected by the organisational context of data work. My research highlights that the dynamic nature of police work, the strains of policing, the limited resources and organisational cultural narratives at the Netherlands police impact the data produced by street-level police employees.
The human
Human dynamics also impact situated datafication. Values held by street-level police employees play a large role in their discretionary decision-making when it comes to data work. In my research, the values of prosecution, collegiality, and service to citizens were most prevalent.
Mick and David refrained from recording a call about a woman yelling at children. Having visited her twice already, they felt a third time would not improve the situation. ‘Unless she really starts attacking those children, there's nothing we can do with it anyway’, one of them notes.
In this case, the lacking potential for prosecution was a reason not to respond and report the situation.
Conversely, during an observation with a detective, the question was raised whether video material should be considered in a case. Although the case was quite small – a suspect threw a stone at their neighbour – the detective offered that the circumstances and history of trouble and police intervention did warrant a more serious look. The video might support a charge of attempted homicide and jailtime. The neighbourhood might get calmer, saving police resources.
The potential for prosecution thus governed this detectives’ decision. In cases likely to lead to (future) prosecution, officers were more extensive when reporting in BVH.
The second value, collegiality, encouraged officers to report minor incidents and details, anticipating a benefit for colleagues handling related calls and requests, as the information would be available to them through the BVH software, a material factor. This data was entered to help colleagues. For example, details about a citizens’ attitude might inform how to approach that person. Officers often gave more context in cases involving people where they deemed future contact likely.
For example, I have observed a police employee reporting about an elderly woman add contact information of her son. While there had been no previous contact, the woman called the police multiple times that day telling a confusing and incoherent story. Her son explained she suffers from dementia. By reporting his contact information, the next colleague responding to a call about or from this woman would have the necessary information at hand.
The third value relates to officers’ public role, providing services to citizens. This manifested as for example, being more lenient when citizens have no prior registrations or cooperate in a friendly manner, leading to less extensive data entry. Service could also consist of strictness rather than cutting slack.
Jordy and Stefan chose to stop a driver who had a small child in the front seat of their car. They registered the incident, hoping it would convey the importance of car safety to the driver and prevent a serious accident. Service often entailed extra work for the police.
Aside from these three types of values, more practical considerations played a role as well, for example, an emergency call interrupting a fine, preventing administrative burden by not arresting an offender who was cooperative, or adding extra detail in a report to prevent additional questions or system errors. Practical considerations were also visible in earlier literature on police data work (Byrman and Byrman, 2018; Waardenburg, 2021). The values of street-level police employees as well as practical considerations thus dictate what events or details are datafied.
In sum, data work at the street-level has become an integral part of police employees’ activities. The creation of data is driven by values of prosecution, collegiality, and service to citizens, as well as practical considerations. The further processing of data by data professionals is of little concern to police employees engaging in street-level data work; they are often unaware of the implications of the data they construct for algorithmic applications. As remarked by Guido: ‘using [our data] for analyses is fine and all, but it should not inconvenience us’. Not all data is created equal, some events and details are not turned into data, whilst others explicitly are. There are many choices behind the datafication of street-level police cognitions; the situated context largely dictates what ends up being registered in police reports as data and how exactly that is registered.
Conclusion and discussion
As algorithmic and data-driven systems are increasingly introduced in public organisations, scholars and data professionals alike have drawn attention to the pivotal role of data quality in such systems. However, only limited empirical attention has been paid to how such data come into existence and how practices of datafication may affect the data and interpretation thereof. Taking an interpretivist and sociomaterial constructivist perspective, the current study fills this gap, asking ‘How do material factors and human actors interact in ‘data work’ to enable datafication of street-level situated contexts at the Netherlands Police?’
This ethnographic research on data work conducted by emergency response officers, detectives, and intake & service employees at the Netherlands Police shows that data work and datafication of street-level cognitions are situated tasks, deeply embedded in daily activities. Datafication is informed by police employees’ values, organisational context dynamics and practical considerations, and thus far from neutral or objective. There is no guarantee that these goals and values are in accordance with those of data professionals who are using the data for algorithmic applications and analyses. If the data professional is not aware of differences in goals and values, they might misunderstand and mis-qualify data constructed at street-level, which could lead to misinformed algorithmic systems and skewed images of situations.
The discussion of specific data work practices underscores how data entered in the BVH-system is the result of a mutual shaping by human actors and material factors, within the situated context of datafication. Structured and unstructured data each bring their own data-interpretation challenges.
Structured data in BVH limit police employees’ discretion to predefined labels. The materiality of these menus and lists is expected to standardise inputs, but these labels are often incompatible with employees’ cognitions. Despite the rigidity of the BVH-system, employees find ways to circumvent labels and often resort to selecting more generalised options due to the overwhelming number of choices the BVH-system offers. In contrast, unstructured data allow police employees much more freedom and discretion in data work. However, the complexity of note-taking and report-writing makes unstructured data work challenging. Some officers opt for paper notes, but this practice is increasingly replaced by writing preliminary reports on-site on mobile phones. Often, no notes are made at all, and officers rely solely on their (collective) memory. Omissions, linguistic nuances, faulty hardware, and other inconsistencies impact the data and may lead to interpretation difficulties during algorithmic processing. This may cause harm, for example, when citizens are labelled incorrectly and treated accordingly. Unstructured data also introduce the presence of ‘voice’ in police reports, and it is not always clear whose voice is reflected in a text. Instances of police-specific language use, inconsistent practices for example, when it comes to verbatim representation of speech and use of metaphors complicate algorithmic interpretation of these data and open the door to biases and misinterpretation.
Evidently, data entered in BVH are not ‘factual’ despite the aura of objectivity standardised information sometimes has (c.f. O’Connor et al., 2022; Waardenburg, 2021). This is particularly relevant as data are used in algorithmic applications. BVH is one of the data sources for the Dutch predictive policing software and will likely be used in other types of algorithmic applications in the (near) future. Data professionals working with police reports should recognise the complexities of its construction. Overall, the research reveals the intricate interplay between human actors and material factors in the datafication of street-level situated contexts at the Netherlands Police. The study provides practical value to the Netherlands Police and similar organisations working with ‘ready-to-use’ data. Specifically, it offers data professionals and managers insight into data creation, and how that might affect the data and its interpretation. This understanding can help organisations be better equipped to prevent potential issues during algorithmic processing that are not commonly anticipated by data professionals not involved in data creation. Further, this study provides three main contributions to the literature:
First, it adds an empirical account of data creation with a specific focus on situatedness. The importance of data for algorithmic systems is widely acknowledged (Muller et al., 2019; Sambasivan et al., 2021), and it is commonly accepted that data (or ‘capta’) are constructed (Drucker, 2011; Gitelman, 2013; Kitchin and Lauriault, 2014) by assemblages of human actors and material factors engaging in ‘data work’ (Introna, 2016; Møller et al., 2020; Orlikowski, 2007; Pink et al., 2017; Thylstrup et al., 2022; Waardenburg, 2021) in specific situated contexts (Bates et al., 2016; boyd and Crawford, 2012; Kitchin, 2022; Loukissas, 2019; Tkacz et al., 2021). However, these processes are often not studied directly. This ethnographic study provides an in-depth account of how this happens in police practices.
Second, the findings provide new insights for the academic debate on responsible introduction, use, and implementation of algorithmic systems in the public sector, particularly the police. With public organisations increasingly reliant on algorithmic systems (Broomfield and Reutter, 2022; Kuziemski and Misuraca, 2020), data and datafication are key to responsible implementation of such systems (Gerlitz, 2017; Meijer et al., 2019; Meijer and Grimmelikhuijsen, 2021; Wieringa, 2020; Wirtz et al., 2022). Public administration literature points to the importance of data quality, collection, analysis, and use (Ruijer et al., 2023), but the creation of data through data work has received much less attention. This study contributes to these discussions by highlighting the importance of data creation for analyses of responsible algorithmisation.
Third, the study provides a preliminary categorisation, contributing to critical analyses of data creation through data work. Specifically, the study highlights a distinction between structured and unstructured data, surfacing their different challenges to data creation, interpretation, and algorithmic processing. The study also shows the relevance of categorisations of organisational dynamics (e.g. office culture, bodily and mental strain and priorities) and human dynamics (e.g. values and practical considerations) for understanding patterns of data creation. Whilst other types of dynamics not included in this research may also affect datafication and data work practices, this initial categorisation provides a useful starting point for future research.
Limitations
The approach taken in this article allows for a deeper understanding of the interplay between human actors and material factors in relation to their situated contexts, and how they coincide in data construction. It provides an initial understanding of how this might affect the data, interpretation thereof, and potential future algorithmic processing. However, the scope of the empirical research in the current study is limited to data construction. It would be interesting for future research to broaden its focus to include empirical fieldwork on how that data are subsequently processed algorithmically and to what extent data professionals cope with the patterns identified in the study.
The focus on the specific case of police reports in the Netherlands poses an important limitation for the generalisability of this study. As situated contexts are essential to the approach taken, the findings may not be fully applicable to police organisations in other countries or other types of (public sector) organisations. Nevertheless, the patterns discussed in this study may be indicative for these other contexts and would be a useful starting point for future research. As such, the current research can still provide insights for other organisations that find themselves with large amounts of data just ‘lying around’. Rather than taking such data for granted, subconsciously giving it value as an objective source of information, organisations would do well to consider the datafication processes underlying these data. This is particularly important in the public sector, where efforts towards responsible algorithmisation might impact citizens’ lives and their trust.
Footnotes
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work was supported by funding from the Nederlandse Organisatie voor Wetenschappelijk Onderzoek for the project ‘ALGOPOL. Value-Sensitive and Transparent Algoritmization: Key to Building Citizen Trust?’ (algopol.sites.uu.nl) [406. Q1 DI.19.011].
