Abstract
Increasingly, researchers are assessing the causal effects of procedurally just treatment by police on public attitudes using experimental vignettes across the world. However, there are two key limitations to this body of research, particularly when drawing causal conclusions about theoretical mechanisms. The first is that most research on procedural justice, and particularly using experimental vignettes, has been conducted in countries with similar roots in policing (i.e., Anglo-Saxon, English-speaking countries). The second limitation is that research on procedural justice theory using vignettes often fails to closely assess the mechanisms and potential confounds linking experiences of treatment and subsequent attitudes. The current study aims to address both of these gaps by replicating US experimental vignettes among a representative sample of Dutch residents. Specifically, we field a 3 × 2 × 2 between-subjects vignette to assess different components of procedurally just treatment by police on individual attitudes during a traffic stop. We assess causal assumptions using a series of follow-up questions about placebo characteristics, and investigate underlying mechanisms by analysing open-text responses following each police legitimacy item. The results from the current study show that, while procedurally just treatment by police was statistically related to perceptions of encounter-specific police legitimacy, the strength of the causal effect and underlying mechanisms were not so clear. Not only was a key causal assumption violated, indicating that the effect does not run only through the treatment, but respondents suggested that they would comply and trust police in the situation based on broader societal norms and expectations, not necessarily because of police behaviour. Methodologically, our study suggests that researchers using experimental vignettes need to pay more attention to causal assumptions. Theoretically, our study shows that those interested in testing procedural justice theory (and criminological theories more broadly) must think critically about identifying and evaluating competing mechanisms in different societal and institutional contexts.
Introduction
Increasingly, researchers are assessing the causal effects of procedurally just treatment by police on public attitudes using experimental vignettes across the world (Abril et al., 2024; Jonathan-Zamir et al., 2024). Procedural justice theory proposes that when police treat people with respect, neutrality, express trustworthy motives, and allow voice, individuals are more likely to perceive the police to be legitimate, support police decisions, comply with officer directives, and ultimately obey the law (Tyler and Huo, 2002). Any interaction with the police therefore has the potential to be a ‘teachable moment’ that can change attitudes for better or worse (Tyler et al., 2014). Typically, vignettes depict an interaction between police and an individual, wherein elements of procedural justice (respect, trustworthy motives, neutrality, voice) are manipulated, after which respondents are asked to report their perceptions of police. Part of the causal evidence for procedural justice theory, and its intermediate mechanisms, is drawn from this body of experimental research, as randomized field experiments are costly and challenging to implement (e.g., MacQueen and Bradford, 2017).
However, there are two key limitations to this body of research, particularly when drawing causal conclusions about theoretical mechanisms. The first is that the vast majority of research on procedural justice, and particularly using experimental vignettes, has been conducted in countries with similar roots in policing (i.e., Anglo-Saxon, English-speaking countries such as the United States or Australia) (Nivette et al., 2024). While these countries differ in several aspects with respect to policing strategies and cultures, they nevertheless share certain features that allow researchers to make some generalizations across contexts. Within regions, including continental Europe, policing cultures and strategies can vary substantially (De Maillard and Roché, 2018), which in turn influence the likelihood and nature of police-public contact and interactions (De Maillard et al., 2018). More replications are needed to assess to what extent these theoretical mechanisms operate similarly across different societal and policing contexts.
The second limitation is that research on procedural justice theory using vignettes often fails to closely assess the mechanisms and potential confounds linking experiences of treatment and subsequent attitudes (Nivette et al., 2024). One key causal assumption here is information equivalence, also known as excludability, which states that the effect must run only through the treatment and not through some other pathway (Dafoe et al., 2018). In relation to procedural justice theory, manipulating treatment may not only influence perceptions of respectful treatment but also perceptions of the officer or subject's characteristics, the environment, and other potential confounding mechanisms. Interrogating these assumptions, as well as underlying causal mechanisms, requires one to measure and evaluate potential confounding or alternative pathways within the design.
The current study aims to address both of these gaps by replicating US experimental vignettes among a representative sample of Dutch residents. Specifically, we field a 3 × 2 × 2 between-subjects vignette to assess different components of procedurally just treatment by police on individual attitudes during a traffic stop. We assess causal assumptions using a series of follow-up questions about placebo characteristics, and investigate underlying mechanisms by analysing open-text responses following each police legitimacy item. This design is a direct replication of Nivette and van der Vegt (2025), and a conceptual replication of prior police-citizen contact vignette studies implemented in the United States (e.g., Reisig et al., 2018).
Procedural justice policing in a comparative perspective
Based on procedural justice theory, the theoretical expectation is that a single encounter can be a ‘teachable moment’ based on how the officer interacts with and treats the subject. This effect is rooted in the idea that individuals value interpersonal treatment above instrumental concerns, and that treatment subsequently shapes how individuals view specific encounters as well as the police more generally (Tyler and Huo, 2002). There are two key dimensions to procedurally just treatment: the quality of treatment (i.e., respectful treatment, expressing trustworthy motives) and the quality of decision-making (i.e., allowing voice during the interaction, making neutral decisions based on facts) (Solomon, 2019). Following each encounter, individuals are expected to update their legitimacy beliefs based on their perception of the officer's (or officers’) specific actions (Dan-Irabor et al., 2023).
There is no general consensus on how to define or measure police legitimacy (Blount-Hill and Gau, 2022). Some have drawn from more formal definitions of legitimacy as the recognition of the ‘right to rule’ (Bottoms and Tankebe, 2012), while others focus on perceived obligation to defer to authorities’ directives and judgements (Tyler and Huo, 2002). In the current paper, we adopt a working definition of legitimacy as the ‘belief that the institution [the police] is appropriate coupled with an internalized obligation to obey’ (Jackson and Pósch, 2019: 183). In Tyler's (2023) review of legitimacy-based models, he identifies three typical ‘indexes’ of police legitimacy: perceived obligation to obey, expressions of trust and confidence, and normative alignment between police and community. However, researchers have criticized this model and measurement of police legitimacy, as feelings of obligation can be motivated by more than just perceived legitimacy of the police (Tankebe, 2013). Other non-normative motivations, such as fear, powerlessness, expediency, or other instrumental concerns, can motivate situational compliance and obligation to obey (Akinlabi and Murphy, 2018). While we are not able to resolve these theoretical and methodological debates here, we acknowledge the contested nature of definitions of legitimacy, and therefore take a broad and relatively agnostic approach to measuring police legitimacy in this paper. Specifically, we draw from Tyler's analytical conceptualization of police legitimacy, based on the three ‘indexes’ of legitimacy: obligation, trust, and normative alignment. We adopted this approach because these measures are frequently used when testing procedural justice theory, particularly within vignette studies (Nivette et al., 2024). We return to these issues about conceptualizing and measuring legitimacy in the discussion.
Based primarily on surveys, researchers have found a correlation between perceptions of procedural justice and police legitimacy across many different cultural and policing contexts (Walters and Bolger, 2019). However, the theory may not operate in the same way in different cultural and policing contexts (Akinlabi and Murphy, 2018; Sun et al., 2017). Specifically, some have highlighted how, in societies where police are corrupt, where they represent legacies of colonial power, and/or where they fail to achieve basic levels of security, public expectations and perceptions of police are more instrumental and less concerned about procedurally just policing (Bradford et al., 2014; Tankebe, 2008). This contrasts with societies where democratic policing is the norm and where trust is relatively high(er) and stable over time. In these societies, including many European countries, the assumption is that public expectations are more attuned to procedural concerns and interpersonal treatment than effectiveness (Na et al., 2023). Yet these concerns about generalizability have mostly been distilled into a dichotomy, focusing on comparisons between ‘Western societies’ and the Global South (e.g., Ghana, South Africa, and Brazil), with less attention to comparative research and assessing heterogeneity within Anglo-Saxon and continental European societies.
On the one hand, there is an assumption that procedural justice theory should operate similarly across contexts, at least where policing is based on democratic principles (Jackson et al., 2024). On the other hand, even in democratic societies, contexts of policing vary in their operations, public expectations, and historical roots (De Maillard and Roché, 2018). Policing in the United States, for example, can be characterized as more aggressive than policing in many European countries (Braga et al., 2019), particularly when measured using lethal police violence (Hirschfield, 2023). Within Europe, baseline levels of trust in police also differ across and within countries (Schaap and Scheepers, 2014). Routine policing operations also differ, whereby police may be more or less engaged in foot patrol, traffic control, and neighbourhood policing, which can in turn influence the likelihood and nature of police-citizen contacts (De Maillard et al., 2018; Schaap, 2021). All of these characteristics may not only influence the strength of the relationship between police treatment and attitudes, but also the underlying mechanisms. For example, in contexts where police tend to be more aggressive, with more potential for conflictual or violent encounters, individuals may opt to accept decisions and comply in the moment out of fear (Jackson et al., 2022). In contexts where trust in police is high and stable, the influence of a single encounter with police on attitudes may also be weaker due to strong pre-existing norms and trust, leading to a ceiling effect. In these contexts, individuals may also be more ‘forgiving’ of single transgressions, as high levels of historical trust can act as ‘Teflon’, minimizing the effects of negative events on perceptions of police (Thomassen et al., 2014).
How can we evaluate to what extent these effects work the same way in different contexts? One common approach is to evaluate correlations between police contact and attitudes from (typically) surveys across countries (Roché and Fleming, 2022). However, these general, retrospective police contact questions cannot capture the varying situations in which individuals are exposed to police actions and treatment in daily life (Nivette et al., 2023). Another approach would be to conduct replications of experimental vignettes depicting similar police-citizen interactions across countries. Even considering the amount of experimental vignettes testing procedural justice theory around the world, the designs, measures, and specific contexts of the interactions differ widely across individual studies, making comparisons challenging. While the situation itself will inevitably differ according to the context of policing, the ways in which police express procedural justice, according to theory, can be better synchronized across studies and contexts. The current study therefore aims to replicate and extend prior studies conducted outside of Europe, namely the United States, in the Dutch context (more details provided below).
Evaluating assumptions and mechanisms
Experimental vignettes are a common tool used to evaluate theoretical mechanisms related to procedural justice theory, as well as criminology more broadly (Nivette et al., 2024). As such, it is necessary to critically evaluate the causal assumptions underlying these experimental designs (Dafoe et al., 2018; Nivette and van der Vegt, 2025). While randomization is often considered sufficient for identifying causal effects, some have noted that such reliance can overlook the need to assess underlying mechanisms and the role of context (Sampson, 2010). If these assumptions are violated, or if they remain unexamined, findings may provide only a partial or even misleading account of the processes under study.
In order to critically assess the extent to which the causal effects and mechanisms of procedural justice can be replicated in this context, we implement a combination of closed-ended placebo follow-up questions and open-ended questions (Nivette and van der Vegt, 2025). Placebo questions are important to assess the extent to which the experimental design satisfied the assumption of information equivalence, that is, that the effect runs only through the manipulation and not through some other confounding or alternative pathway (Dafoe et al., 2018). Placebo questions should not be mediating mechanisms as defined in the theory, but ideally represent pre-existing (to the scenario) or ‘static’ characteristics about the subjects or situation described in the scenario. If the treatment (procedural justice) leads participants to update their beliefs about these placebo characteristics, and these characteristics are also related to the outcome (police legitimacy), then the information equivalence assumption has been violated and the strength of causal claims is weakened.
The open-ended questions probe the possible mechanisms that lead from procedural justice to perceptions of police legitimacy. This allows us to go beyond pre-defined mechanisms (in closed-ended questions), directly asking participants what influenced their perceptions. Text responses may reveal that perceptions of legitimacy were not influenced by treatments, but by other processes or pre-existing attitudes. This approach aligns with the growing body of social science research that makes use of open-ended questions in surveys, which can subsequently be quantitatively analysed through topic modelling to identify common themes (Ferrario and Stantcheva, 2022; Roberts et al., 2014).
Current study and context
The current study aims to replicate previous experimental vignettes depicting an interaction with the police in the context of a traffic stop in the Netherlands (Nivette and van der Vegt, 2025; Reisig et al., 2018; Terpstra and Van Wijck, 2023). The Netherlands offers a context of policing that in many ways contrasts with the United States, and even other Western European countries such as the United Kingdom. In the Netherlands, police are organized under a centralized national agency, the Dutch National Police. According to the European Social Survey (Round 11), trust in police in the Netherlands is relatively high, comparable to Finland, Norway, Sweden, Austria and Germany. High existing trust may translate into generally smaller or null effects of procedural injustice on legitimacy, particularly within the context of a single traffic stop. In relation to traffic enforcement, the Dutch police rely primarily on cameras, targeted traffic controls, and ‘speed trajectory controls’ to detect and enforce traffic laws (Goldenbeld et al., 2019). This means that the likelihood that individuals encounter police through a ‘random’ traffic stop on the road, as depicted in a number of vignettes set in the United States (e.g., Solomon, 2019), would be very low in an average Dutch person's life.
Instead, Dutch residents are generally more likely to encounter police during targeted traffic stops, primarily for bicycles and scooters/mopeds. These targeted traffic stops aim to, for example, check lights, speed and, in relation to scooters, helmets, insurance and maximum speed. The speed limit for scooters varies according to the road type (30, 40, or 45 km/h), while the maximum designed (manufactured) speed allowed for a scooter is 45 km/h. The Dutch police can test whether the maximum speed for a scooter is ‘tuned up’ during a traffic control using a ‘roller bench’ [rollerbank], meaning it has been deliberately altered to be able to drive faster than 45 km/h. Residents travelling by bicycle or scooter may therefore encounter these traffic controls in busy streets, or specific areas, for example, close to a school. The scenario described in our experimental vignette depicts an encounter between a police officer and subject during this type of traffic stop. More information about the content of the scenario is provided in the ‘Methods and data’ section.
We constructed our hypotheses based on theory and past experimental evidence that exposure to elements of procedural injustice will significantly affect subsequent perceptions of encounter-specific police legitimacy. We use a 3 × 2 × 2 between-subjects vignette to evaluate the effect of three of the four elements of procedural justice (i.e., respect, trustworthy motives, neutrality) on perceptions of police legitimacy. We sought to disaggregate the four elements of procedural justice within the vignette, as there is evidence that separate components may have differential effects, indicating that individuals weigh these components independently (Nivette et al., 2024; Solomon, 2019; Tyler and Blader, 2000). In addition, following previous evidence that effects of procedural justice, and particularly disrespect, are asymmetrical (Choi, 2021), we evaluate to what extent disrespect has a significantly stronger effect on perceptions compared to respect. Specifically, we hypothesize:
H1a: Participants exposed to respectful treatment by the police officer will report higher police legitimacy compared to those exposed to neutral and disrespectful treatment, respectively. H1b: Participants exposed to disrespectful treatment by the police officer will report lower police legitimacy compared to those exposed to neutral treatment, but those who were exposed to respectful treatment will not report higher police legitimacy compared to neutral treatment (asymmetry hypothesis). H2: Participants exposed to trustworthy motives by the police officer will report higher police legitimacy compared to those exposed to the scenario in which the officer says nothing regarding motives. H3: Participants exposed to neutral decision-making by the police officer will report higher police legitimacy compared to those exposed to officers who demonstrate biased decision-making.
We use both follow-up placebo test questions about the scenario and open-text answers to evaluate the causal mechanisms underlying participants’ reactions to the vignette. The placebo questions aim to measure to what extent participants update beliefs about background characteristics and open up a ‘back door’ causal pathway to perceptions of police legitimacy. In particular, we explore to what extent exposure to procedurally just treatment influences beliefs about police effectiveness, a major confounding pathway within procedural justice theory. The open-text questions serve to explore participants’ own reasoning and justifications for their indication on quantitative items measuring encounter-specific police legitimacy. Ideally, participants will justify their reasoning based on characteristics of the vignette (i.e., the manipulations) and theoretical mechanisms as expected based on the theory. For example, if participants exposed to procedurally just treatment would state in their open text responses that they would agree to comply with the officer because of the specific officer's respectful, trustworthy, and/or neutral behaviour in the scenario, this could be interpreted as evidence in support of the theory. Likewise, reasoning in texts that point to normative expectations and motivations for situational compliance and trust (e.g., it is the right thing to do) could be taken as evidence in support of the theory. We may also identify alternative mechanisms proposed, such as reflections on social identity, belonging, and personal sense of power/authority (Jackson and Pósch, 2019), in the text. The hypotheses and analysis plan were pre-registered on the Open Science Framework (OSF) on 23 May 2024 (https://doi.org/10.17605/OSF.IO/7AY9J).
Methods and data
The data are drawn from a representative sample of Dutch adults (n = 2259) using the Longitudinal Internet Studies for the Social Sciences (LISS) Panel, managed by the non-profit research institute Centerdata (Tilburg University, the Netherlands). 1 The LISS Panel consists of a representative sample of Dutch households (about n = 5000 households and n = 7500 individuals 16+ years old) drawn from the population register Statistics Netherlands. Participants are invited to take part in the panel, no self-registration is allowed. Individuals or households without a computer or internet access are loaned equipment to participate. A random sample of ∼3000 respondents from the panel was invited to participate in the study, with the expectation that the response rate will be ∼70%. The initial sample was 2259, with a response rate of 75.3%. Invited panel members are provided monetary compensation for completing questionnaires. The study was fielded in June 2024.
An a priori power analysis was conducted to determine the sample size needed to detect main and interaction effects. A sample with n = 175 per condition (total n = 2100) was determined to have sufficient power to detect main effects of each element of procedural justice, two-way and three-way interaction effects between elements, and marginal contrasts. Our calculation of interaction effects was based on theory, and how procedural justice has been translated into police training, wherein all four elements are considered important for the implementation of procedurally just policing (Antrobus et al., 2019; Weisburd et al., 2022). If all four elements are needed for the ‘total package’, we would expect that each (positive/negative) element combined produces an additive effect on police legitimacy. However, we did not have adequate information about how large these interactive effects may be, as few studies at the time have assessed each element separately, and those that have found mixed results (e.g., Hazen and Brank, 2022). As a result, it is important to keep in mind that our power calculations may be inadequate given that power needed for interaction effects depends on the expected effect size and the type of interaction (Mize, 2019). This is also why we did not pre-register hypotheses regarding the interactions, and will consider any interaction results tentative and exploratory. The R code and G*Power protocol for all power analyses are provided in the pre-registration.
Design
The vignette was a between-subjects 3 × 2 × 2 factorial design. The three factors reflected procedural justice dimensions of respect, trustworthy motives, and neutrality. Respect consisted of three levels to reduce the comparison of extremes: disrespect (i.e., shouting at the subject), neutral, respect (i.e., saying please, thank you). Trustworthy motives consisted of two levels, in which the officer explains their motives (trustworthy) or says nothing regarding motives (none). Neutrality consisted of two levels in which the officer stops another scooter driver without a helmet (neutral), or the officer says nothing and the subject notices that other drivers are not stopped (bias). The fourth element (voice) was presented in all scenarios as the officer asking the subject if he had any questions. This element was fixed because it is part of standard procedure for police when writing tickets in the Netherlands. The full text of the vignette is provided below, with each element labelled: Prompt: In a moment, you will read a short story about a traffic control. We would like to ask you to read this story carefully. You will then be asked questions about it. Den is riding his scooter to work and sees a police traffic control further down the road. As he gets closer to the traffic control, a policeman asks him to stop. A male police officer walks up to Den's scooter and says: ‘[ Den gives him his papers and the officer starts inspecting his scooter for defects and places the scooter on a ‘roller bench’ to check its maximum speed. After a short while, the cop says, ‘I am going to give you a fine. This scooter has been souped up and you were not wearing a helmet. You will be fined for these offences and you will not be allowed to ride your scooter until it is officially inspected. [ Den signs the report and the officer says, [ [
Prior to reading the vignette, participants were asked about their general attitudes toward police and if they have been stopped by police in the past 12 months. Following the vignette, participants were presented with follow-up questions in four blocks: (1) placebo test questions about the subject, (2) placebo test questions about the officer and area, (3) more ‘traditional’ mechanisms questions about perceptions of procedural justice specific to the scenario, and (4) encounter-specific legitimacy questions with open text responses. The presentation of these four blocks of questions was randomized so that each respondent had an equal chance of being assigned to each of the 24 possible block order permutations.
Measures
Dependent variable
Four items were used to measure encounter-specific police legitimacy adapted from McLean (2020), Reisig et al. (2018) and Hamm et al. (2017). Participants were asked to what extent they agree or disagree on a 5-point Likert scale with statements whether they would feel obliged to do what the officer asks in the situation, whether they would accept the officer's decision in the situation, whether the police in the situation care about the people in the community, and whether they would trust the officers in the situation. Responses were combined into a mean scale for the main analyses (Cronbach's α = 0.79). Full wording for all items and response categories is available in Appendix A in the Supplemental materials.
Independent and socio-demographic variables
Prior to reading the vignette, respondents were presented with four items asking to what extent they agreed or disagreed on a 5-point Likert scale about general perceptions of police legitimacy. These questions were designed to reflect a general version of the dependent variable (encounter-specific police legitimacy). The same items were presented, but worded in a general way (e.g., ‘In general, the police care about people in the community’). The internal reliability was good (α = 0.84). Respondents were also asked whether they had been stopped, approached or contacted by police in the past 12 months. If they indicated yes, they were asked to what extent they were satisfied with this contact on a 5-point Likert scale from very dissatisfied to very satisfied. Socio-demographic variables capturing age, gender, education and ethnic background were drawn from the LISS core module, which is filled out by panel participants separately prior to the survey. Gender is measured using three categories: ‘male’, ‘female’ or ‘other’. Education was categorized into three groups according to the Central Bureau of Statistics classification: low (primary school, prevocational secondary education), middle (senior secondary education, pre-university secondary education, vocational secondary education), and high (higher vocational education, university). Ethnicity was grouped into three categories: native Dutch, Western migration background, and non-Western migration background.
Placebo characteristics
The placebo questions reflect a range of potential characteristics that may influence how respondents’ perceive both the interaction, opening up a ‘back door’ pathway to police legitimacy. Five questions were asked about Den, including to what extent respondents thought he was likely to be a teenager, had a Dutch background, had a non-Western migration background, has been stopped by the police in the past, and has a history of criminal offenses. Six questions were asked about the officer, police team and neighbourhood in the scenario. Specifically, whether the officer had a Dutch background, non-Western background, whether he was experienced, whether the police team was effective at fighting crime, had more complaints, and whether the situation took place in a wealthy neighbourhood. Respondents could answer using a 5-point Likert scale from very unlikely (0–20% chance) to very likely (81–100% chance). For the police experience placebo, respondents were asked to indicate how much experience they thought the officer had on a 4-point Likert scale ranging from very little experience (less than 1 year) to very experienced (16+ years).
Mechanisms: Closed- and open-ended responses
Four close-ended items were included to replicate the process by which previous experimental vignettes have conducted manipulation checks and assessed to what extent treatment activates theoretical mechanisms. Specifically, respondents were asked to what extent they agreed on a 5-point Likert scale about the officer's treatment on four elements of procedural justice: respect, trustworthy motives, neutrality and voice.
In addition, following each item on encounter-specific legitimacy, respondents were asked to provide some reasoning why they choose their answer to a given question. No further instructions were given and no word limit was set, giving the respondents freedom to explain their choices however they wanted.
Analytical approach
Main effects
Our pre-registered analysis plan includes steps to check randomization, test hypotheses, and conduct robustness checks. Any deviations from the pre-registration and additional non-preregistered analyses are specifically noted below.
First, we conducted balance checks using socio-demographic variables (i.e., age, gender, ethnicity, education, experienced police stop) to assess the success of randomization across participants. Analyses of variance were used to assess differences in age across conditions, whereas chi-square tests of independence were used to assess differences in the distribution for categorical variables. There were only three participants who indicated ‘other’ for gender, and so these cases had to be excluded from the analyses. No significant differences between treatments were detected, with one exception (see Table B1 in Appendix B in the Supplemental materials). There was a slight imbalance in the distribution of the trustworthy motives condition in relation to gender, whereby slightly more females were assigned the trustworthy motive condition compared to males (χ2 = 4.323, p = .038).
Second, in order to test hypotheses H1a, H2 and H3, we estimated two models using Ordinary Least Squares [OLS] regressions. The first model estimates the direct effects of procedural justice treatments on the encounter-specific police legitimacy scale. The second model estimates the conditional effects of procedural justice treatments on police legitimacy by including interaction terms. In addition, we computed Cohen's d for each treatment effect to compare effects with previous studies. All models control for the order of the question blocks.
In order to assess H1b (asymmetry effect), we tested whether the coefficients for disrespect and respect (compared to neutral treatment) were equal using a Wald test-based comparison. A rejection of the null hypothesis here would mean that the coefficients are not equal. If so, and if the coefficient for disrespect has a larger absolute value, then this was taken as support for the asymmetry hypothesis (H1b).
Data for the current study are available from the LISS Panel Archive (https://www.dataarchive.lissdata.nl/). Analytic code is available on the OSF (https://doi.org/10.17605/OSF.IO/7AY9J).
Assessing mechanisms
First, we replicated previous studies by estimating the effects of treatment on the four elements of procedural justice (i.e., respect, trustworthy motives, neutrality/bias, and voice). Second, in order to assess information equivalence, we analysed to what extent treatments influenced background and possible confounding pathways measured by the placebo test questions. Importantly, for information equivalence to be violated, it is not enough that the treatment leads to updating background beliefs (i.e., changes in perceptions of placebo characteristics), these characteristics must also be related to the outcome. We therefore explored to what extent characteristics that were activated by the treatment were also correlated with perceptions of police legitimacy. While this test is inadequate for detecting intermediary effects (see Blackwell et al., 2024), it nevertheless offers insight into whether these beliefs might act as important confounds and/or open up a ‘back door’ pathway to police legitimacy.
Third, mechanisms were further explored through topic modelling of open-text responses. Topic modelling identifies underlying themes in a corpus of texts through word co-occurrences. It represents each text as a mix of topics, estimating a probability between 0 and 1 for the text belonging to each respective topic (Blei et al., 2003; Roberts et al., 2014). For example, a text mentioning ‘safe’, ‘danger’ and ‘protect’ may be assigned a probability of .90 for belonging to the topic ‘safety’. We used a keyword-assisted topic model (Eshima et al., 2024), enabling us to pre-define relevant keywords for treatments and mechanisms that we expected to appear. Topics and keywords (in Dutch) (Table C1 in Appendix C in the Supplemental materials) were pre-defined based on a random sample of 200 responses per question, which were subsequently excluded from the final topic model. Based on this assessment, many new topics emerged that were not directly in line with the expected treatments and mechanisms. In line with the preregistration, we assessed (1) the direct and interaction effects of the treatments on topic probabilities (mechanisms) with an OLS regression and (2) whether topic probabilities (mechanisms) had an effect on overall police legitimacy scores through ordinal logistic regression.
Data quality
We assessed data quality for speeding and careless answers in using the open-text survey feedback responses as well as the time taken to complete the survey. The open-text survey feedback revealed two participants who admitted that they gave completely random answers. These participants were excluded from all analyses. An evaluation of the time taken to complete the survey showed that the median time taken was 7.2 minutes. We opted to exclude participants who completed the survey in a very short amount of time (not pre-registered) indicative of speeding and careless responding, that is, <180 seconds, below the fifth percentile of times (n = 130). We report the results of the main text using the sample excluding these participants, and the main results using the complete sample (excluding the two that admitted random answers) in Appendix D in the Supplemental materials.
The sample therefore used for the main analyses was n = 2129. Using this sample, there was no missingness on the dependent variable and most background variables, whereas there was 0.4% missing on education and 2.07% missing on ethnic background for the full sample. Listwise deletion was therefore used for analyses that included the background variables (i.e., balance checks). 2
Results
Descriptive statistics are presented in Table 1. The average age was 55 (SD = 18), which is relatively higher than the average age of the Dutch population. 3 Those with higher education were slightly overrepresented in the sample compared to the Dutch population (i.e., 44% in the sample vs. 32% in the population), while the proportion of respondents with a Dutch background is slightly larger but generally in line with population statistics (79% in the sample vs. 75% in the population). 4
Descriptives for analytical sample (N = 2129).
Main effects
The main effects of treatment on police legitimacy are presented in Figure 1. The figure includes both the short form results (Model 1) as well as the long form results (Model 2), which include all interaction effects (not shown). The coefficients for both Models 1 and 2, including all interaction effects, are presented in Table D1 in Appendix D in the Supplemental materials. The results show that both disrespectful (b = −0.336, SE = 0.032, p < .001, Cohen's d = −0.52) and respectful (b = 0.110, SE = 0.033, p < .001, d = 0.16) treatment had a significant effect on perceptions of police legitimacy (H1a); however, the size of these effects differed. We assessed the equality of coefficients and found that the size of the effect of disrespect was significantly larger than the effect of respect (F = 191.72, p < .001), indicating an asymmetrical effect (H1b). Both the presence of trsutworthy motives (H2) and biased decision-making (H3) had a significant effect on police legitimacy (bbias = −0.102, SE = 0.027, p < .001, d = −0.158; bmotives = 0.108, SE = 0.027, p < .001, d = 0.169). However, the size of these effects is relatively small (d < 0.2). The R2 (adjusted) for this model was also relatively low (0.104), suggesting that procedural justice treatments only explained a small amount of variation in perceptions of police legitimacy.

Main effects of procedural justice on mean legitimacy scale.
The long form results include all interaction effects, and therefore can be interpreted as the effect at the baseline of other conditions. There were no significant interaction effects (see Model 2, Table D1 in Appendix D in the Supplemental materials). While the interaction terms were not significant, this may be due to a lack of power and not necessarily because there are no moderation effects. We therefore plotted the predicted values and 95% confidence intervals for each treatment condition (see Figure 2). There is no clear additive effect, although those exposed to the disrespect-no trustworthy motives-bias condition do show on average the lowest absolute levels of police legitimacy (3.224) and those exposed to the respect/no bias/trustworthy motives condition showed the highest levels (3.930). However, in practical terms, the absolute difference in scores on police legitimacy between those in the most positive and most negative conditions was minimal.

Predicted values for police legitimacy across treatment conditions.
Assessing mechanisms
Figure 3 shows the results for the quantitative analysis of procedural justice treatment on each respective mechanism: the officer treated Den with respect (respect), the officer was neutral (neutrality/bias), and the officer cared about Den's well-being (trustworthy motives). The results show that each treatment significantly influenced each mechanism as expected. Similar to the main effects, the magnitude of difference was relatively small for both bias and trustworthy motives (see full results in Table E1 in Appendix E in the Supplemental materials). Additional exploratory analyses (not pre-registered) showed that in particular exposure to the disrespect condition influenced perceptions about other procedural justice elements as well, even perceptions about whether the officer listened, which was held constant across scenarios (see Figures E1–E3 in Appendix E in the Supplemental materials). By contrast, the other treatments, neutrality and trustworthy motives, had minimal effects on other mechanisms outside their own.

Effects of each treatment condition on their respective mechanisms.
Information equivalence
The short form results for the placebo tests for Den and the officer are presented in Figures 4 and 5, respectively (for full tables, see Tables E2 and E3 in Appendix E in the Supplemental materials). The results show that participants did update their beliefs about Den, particularly those exposed to the disrespect scenario. Specifically, disrespect increased the likelihood that participants perceived Den as a teenager, as having a non-Western migration background, and as having been stopped previously. Bias primarily influenced perceptions of Den's ethnicity, whereby bias was associated with a higher probability of having a non-Western migration background.

Placebo effects on characteristics related to Den.

Placebo effects on characteristics related to the officer and neighbourhood.
Figure 5 shows that disrespect increased the likelihood that participants perceived that the officer was less experienced, that the police in that area had prior complaints, and were less effective at fighting crime. Participants exposed to disrespect condition were also less likely to perceive that the scenario took place in a wealthy neighbourhood. The bias condition also led participants to perceive that the department had prior complaints and was less effective at fighting crime.
Next, we explored whether these beliefs were correlated with police legitimacy, thus violating the information equivalence assumption and opening up a ‘back door’ pathway to the outcome. We focused on selected placebo characteristics that were significant in previous models and could theoretically confound the effect of procedurally just treatment, namely perceptions of Den's age, ethnicity, and prior criminal behaviour, as well as the officer's ethnicity, experience, prior complaints, effectiveness, and neighbourhood characteristics (see Figure 6). Most notably, the results showed that perceiving Den had a non-Western migration background (r = −0.08), the officer's experience (r = 0.31), whether the department had prior complaints (r = −0.26), effectiveness (r = 0.32), and whether the neighbourhood was wealthy (r = 0.15) were significantly correlated with police legitimacy. While we cannot draw strong conclusions about the intermediary effects of placebo characteristics here, the results provide an indication that the treatment effect led participants to update their beliefs on other characteristics, particularly effectiveness, and at the very least the information equivalence assumption has been violated.

Correlations between selected placebo effects and encounter-specific police legitimacy.
Topic modelling
The results of the topic model are shown in Table 2, with percentages per topic and example responses for each. Frequent terms per topic can be found in Table E5 in Appendix E in the Supplemental materials. In response to the first question, compelled to comply, a large number of texts (50.26%) described that the moped driver was in the wrong, and thus should accept the consequences. Several texts referred to the authority role of the police (29.65%) being the reason for them complying. Some participants mentioned the officer being respectful or rude (i.e., treatment conditions), and others stated they would comply now but contest the ticket later. In response to the question on obligation to obey, again, many texts described that the driver was in the wrong (39.20%) and hence should obey. Similar to the previous question, participants mentioned that they should obey because the police is an authority (15.17%). Some mentioned that they would start a discussion with the officer about the ticket (9.88%) or contest the ticket later (9.97%), others say they always follow the rules and hence would obey (9.38%). The question of whether the department cares about the community showed that the officer explaining their decision (26.70%) and his respectful behaviour (17.72%) contributed to the impression of care. At the same time, a number of texts described that participants did not know whether the department cared (22.04%), and others reported their general attitudes about police (9.10%) instead of specific attitudes about the department/officer in the vignette. Some mentioned the police is just doing their job (7.45%), and that their role is to ensure safety (6.70%). Rudeness of the officer was mentioned by a small number of participants again. Finally, whether participants trust the officers from the department, was largely based on general attitudes about the police (30.77%), or officers ‘just doing their job’ (27.04%). Importantly, a number of participants said they do not know whether they can trust the officers (14.17%) or do not state a reason not to (12.32%). The officer being rude, explaining his decision, and showing bias were also mentioned. Results of the OLS regression of treatments on topic probabilities and the ordinal logistic regression of topic probabilities on perceived police legitimacy showed very limited effects. The full results are reported in Appendices F and G in the Supplemental materials, respectively.
Topic model results and samples.
Additional analyses
The results for two pre-registered additional analyses can be found in Appendix H in the Supplemental materials. First, given the argument that items related to obligation to obey and trust or normative alignment reflect conceptually and empicially distinct outcomes instead of one latent construct (legitimacy) (Worden and McLean, 2017), we conducted additional analyses using each legitimacy item as an outcome separately (Tables H1 and H2 in Appendix H in the Supplemental materials). There are four notable findings to report from these analyses: (1) the effect of disrespect in the short form results remained relatively consistent across items, (2) the effect of trustworthy motives was generally weaker across items, and was primarily related to respondents’ perceptions that the police cared about the community, (3) the effect of bias was also generally weaker across items with the exception of trust, and (4) there no strong interaction effects across all items. Second, we explored to what extent the effect of treatment remained when including a measure of general perceptions of police legitimacy measured prior to reading the vignettes (Table H3 in Appendix H in the Supplemental materials). While the main treatment effects were substantively similar, the effect of general legitimacy was strong and positive (bModel 1 = 0.515, SE = 0.018, 95% CI [0.480, 0.550]). The R2 for this model increased substantially compared to the model without this variable (adjusted R2 = 0.358). There were no significant interaction effects (Model 2, Table H3 in Appendix H in the Supplemental materials). Finally, given that we detected some imbalance between treatments on gender, we ran the main models again including gender as a covariate (not pre-registered; the results remained the same, see Table H4 in Appendix H in the Supplemental materials).
Discussion
The results from the current study show that, while procedurally just treatment by police was statistically related to perceptions of encounter-specific police legitimacy, the strength of the causal effect and underlying mechanisms were not so clear. Results showed that (dis)respect, trustworthy motives, and bias/neutrality were significantly related to police legitimacy (H1a, H2, H3); however, these effects were generally much smaller compared to many effects found in the literature using vignettes (Cohen's d between |.158 and .52|). In addition, we detected asymmetrical effects on legitimacy, whereby disrespect had a significantly stronger effect on legitimacy compared to respect (H1b). These results have implications for how procedural justice theory and its mechanisms are tested across contexts. We outline two key findings to consider for future research.
First, while the quantitative effects were generally in line with previous research from the United States (albeit much smaller), the analysis of open text responses revealed new insights. Dutch participants were influenced by their pre-existing attitudes toward police and not solely by the manipulations in the vignette, especially regarding their trust of the police and their perceptions of them caring about the community (Nivette and van der Vegt, 2025). Respondents were also relatively unwilling to extrapolate broader police behaviour based on this single encounter (e.g., ‘The story was about 1 police officer. No idea how the rest behave’). Other mechanisms reflect key differences between study populations. High trust in police in the Netherlands seemed to manifest in responses that do not question the police officer in the scenario, and rather express obedience simply because they are an authority. The topics emphasize the general expectation of obedience and law-abiding norms in the country (e.g., ‘If I know that my moped is tuned up and that I am required to wear a helmet, then if I am stopped, I must also accept the consequences’). Furthermore, the responses often drew on general expectations about the police profession, but did not provide justifications connected to the scenario itself (e.g., ‘I generally assume they care when someone chooses this profession’). This suggests that the feeling of obligation was driven to no small extent by non-normative or other reasons, such as expediency or morality (Fine and Rooij, 2021). More broadly, these more general responses may also reflect the changing context of everyday policing in the country, where police are increasingly more ‘distant’ from communities, closing down smaller police stations, promoting online crime reporting, and relying more heavily on automated systems for traffic enforcement (Goldenbeld et al., 2019; Schaap, 2018). The increasing distance between the police and public may mean that individuals must draw on broader societal values and beliefs about the police profession, rather than expectations of treatment during interactions.
Notably absent from respondents’ justifications was the need to avoid trouble or (physical) conflict with the officer out of safety concerns, a key non-normative motivation for compliance (Jackson et al., 2022; Reisig et al., 2023). In the Netherlands, only a very small number of participants noted they would comply to avoid escalation of the situation, and some even wrote they would object and initiate a discussion with the officer. This suggests that the context of fear and danger in police-citizen interactions is absent in the Netherlands (Sierra-Arévalo, 2021). Together, these results highlight how different policing contexts give rise to a wide variety of mechanisms that influence perceptions of police legitimacy and situational compliance, many of which cannot be attributed to procedurally (un)just treatment in a specific scenario. However, most mechanisms identified through topic modelling were not affected by the procedural justice manipulations nor related to legitimacy, further calling into question the causal effects and mechanisms underlying the theory.
Second, our analysis of the information equivalence assumption through placebo characteristics revealed that this assumption was violated. This means that we cannot draw reliable conclusions about the causal effects of procedurally just treatment on police legitimacy, as the effect does not run only through the treatment. One important finding was that procedural justice also influenced perceptions of police effectiveness, which does not fit with the theoretical argument that normative (procedural justice) and instrumental (effectiveness) concerns have distinct influences on attitudes and behaviours (Na et al., 2023; Sunshine and Tyler, 2003), and that normative concerns typically have ‘stronger’ effects (larger coefficients) compared to instrumental concerns (Jackson et al., 2024). One counter-argument is that perceptions of effectiveness and procedural justice are part of a broader latent construct of police legitimacy, while feelings of obligation and situational compliance are separate potential outcomes of legitimacy beliefs (Tankebe, 2013). Our results suggest that, at the very least, the theoretical distinction between instrumental and normative motivations is not so distinct among the public. This makes sense given that procedural justice and effectiveness are often highly correlated in surveys (e.g., Na et al., 2023). This relationship may be especially pronounced in the Netherlands, where police trust-building efforts have emphasized delivering reliable, high-quality service to citizens, while placing comparatively less emphasis on procedural justice (Schaap, 2018). As a result, the Dutch police have increasingly standardized their services, for instance, by encouraging citizens to report crimes online, which has in turn reduced the frequency of direct police–citizen interactions.
Relatedly, this would imply that isolating different procedurally just police behaviours, and likewise the treatment effect, is not so clear-cut in vignettes and surveys, and likely even more challenging in real-life encounters. This is compounded by the fact that real interactions are dynamic, wherein both police and individual behaviour changes based on situational and interpersonal inputs over time (Piza et al., 2023). More attention is needed to understand these situational dynamics and what aspect(s) drive changes in attitudes and situational compliance in the short and long term (if any). This requires that future research examine more detailed and dynamic interactions, for example, using virtual reality (Van Gelder, 2023), CCTV/body camera footage (Sunde et al., 2023), or within-individual vignettes, in order to disentangle the strength of these effects.
Limitations and conclusions
The strengths of the current study include a large, diverse sample of Dutch residents, disaggregating different elements of procedural justice, and incorporating placebo and open-text questions to interrogate causal assumptions and mechanisms. However, there are also a number of limitations to discuss.
First, the sample of Dutch residents was over-representative of certain demographics. Namely, the sample was slightly older and higher educated than the Dutch population; however, it was in line with the distribution of migration background in the population. Results from a primarily youth sample may therefore differ, as young people may, for example, place emphasis on different elements of police behaviour and provide different reasons for situational compliance. This is generally related to a broader issue in procedural justice, as well as other criminological theories, regarding the lack of formal theory that clearly outlines predictions about the sequence and size of effects of treatment (i.e., police contact and behaviours) on a given outcome (Frankenhuis et al., 2023). Recent critiques have emphasized this issue in procedural justice theory, arguing that, even though differences are acknowledged, theory testing still often relies on a ‘consensus perspective’ that ignores pluriformity and is unable to ‘open the black box’ of what happens within daily police-citizen interactions (Schaap and Saarikkomäki, 2022). More formal theoretical modelling and testing is needed to determine the specific concatenations of mechanisms that determine attitudes across contexts. This would require further theoretical and empirical work that can disentangle the different mechanisms and measure both short- and long-term effects of these events. If using vignettes, this would require a different experimental design that can formally model and test potential mediating mechanisms (e.g., Auspurg and Düval, 2024), and/or incorporating methods of process-tracing within experimental designs (Herrmann, 2025).
Second, while placebo and open-text responses provide needed insight into assumptions and mechanisms, they do not capture all potential confounds or ‘back door’ pathways that may be operating. In addition, open-text responses may not fully reflect the cognitive decision-making processes that are at work in a given situation (Herrmann, 2025). More real-time measurement of decision-making processes are necessary to capture how participants are experiencing and interpreting input from moment to moment.
Third, it is important to note that we had to balance the length of the questionnaire with the burden on participants when including open questions. We therefore opted to select key items related to encounter-specific legitimacy that are typically used across previous vignette studies. As a result, our items may only partially capture dimensions of these constructs, including police legitimacy and effectiveness. For example, our measure of trustworthiness and normative alignment (i.e., police care for the community) could have more explicitly measured shared values and norms alongside their care for the people in the community (Hamm et al., 2017). Future research should incorporate more items used in legitimacy scales (normative and non-normative, see Reisig et al., 2023), as well as other outcomes, in combination with open text responses to evaluate the decision-making processes that determine support and compliance with the police.
Overall, the current study was able to replicate the statistical relationship between procedurally just treatment and encounter-specific police legitimacy in the Dutch context. However, an assessment of the causal assumptions and underlying mechanisms suggests that police behaviour does not affect legitimacy in the way that is expected by the theory. Not only was a key causal assumption violated, indicating that the effect does not run only through the treatment, but respondents suggested that they would comply and trust police in the situation based on broader societal norms and expectations, not necessarily because of police behaviour. Methodologically, our study suggests that researchers using experimental vignettes need to pay more attention to causal assumptions. At the very least, those employing experimental vignettes must include certain background characteristics (e.g., race/ethnicity, neighbourhood characteristics) and effectiveness as covariate controls (see, e.g., Metcalfe and Pickett, 2021). Theoretically, our study shows that those interested in testing procedural justice theory (and criminological theories more broadly) must think critically about identifying and evaluating competing mechanisms in different societal and institutional contexts.
Supplemental Material
sj-docx-1-euc-10.1177_14773708251407868 - Supplemental material for Assessing the causal mechanisms underlying procedural justice theory in the Netherlands
Supplemental material, sj-docx-1-euc-10.1177_14773708251407868 for Assessing the causal mechanisms underlying procedural justice theory in the Netherlands by Amy E Nivette and Isabelle van der Vegt in European Journal of Criminology
Footnotes
Author contributions
AEN designed the vignettes and drafted the questionnaire. AEN and IvdV contributed to the implementation, analysis and writing of the paper.
Ethical approval and informed consent
The study is approved by the Ethics Committee of the Faculty of Social and Behavioural Sciences of Utrecht University (24-0098). In the Longitudinal Internet Studies for the Social Sciences (LISS) Panel, panellists complete a consent once prior to inclusion and participation in the panel. This consent applies to all questionnaires administered in the LISS Panel.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Data collection was funded by a LISS Panel Grant (2023). The LISS Panel data were collected by the non-profit research institute Centerdata (Tilburg University, the Netherlands). Funding for the panel's ongoing operations has been received from the Domain Plan SSH and ODISSEI since 2019. The initial set-up of the LISS Panel in 2007 was funded through the MESS project by the Netherlands Organization for Scientific Research (NWO). AEN was supported by the NWO Talent Grant (Grant Number: VI.Vidi.191.135).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
Data are available via the LISS Panel Data Archive (https://www.dataarchive.lissdata.nl/), and code to reproduce the analyses is available on the Open Science Framework (
).
Supplemental material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
