Abstract
Violence against healthcare staff, including a threat or an act of violence toward people during their work, poses a physical and psychological risk to workers internationally. Screening is an important strategy in preventing violence against healthcare professionals. The aim of this systematic review was to synthesize evidence on the predictive validity of risk assessment tools used to screen for violence and aggression risk toward healthcare workers in emergency and psychiatric departments (PD). Primary studies that examined the predictive validity of risk assessment tools for workplace violence were identified via a systematic search of Medline, PsycINFO, Embase, and the Cochrane databases. There were 62 eligible studies, ten of which had a lower risk of bias (RoB). Those studies with high RoB were primarily due to a failure to present calibration measures as part of the analysis. All included studies adopted a longitudinal design and were conducted in PDs. The ten highest-quality studies reported on eight different instruments, four of which showed acceptable to outstanding predictive performance. The Dynamic Appraisal of Situational Aggression and the Brøset Violence Checklist showed the best predictive performance; they were also validated in emergency departments and are best suited for short-term risk prediction. We recommend that the selection of a risk assessment tool should consider the following: (a) the target population, (b) the violence operationalization, and (c) the purpose of the monitoring. We note that the use of a screening tool should be a part of a multicomponent strategy to ensure staff safety.
Introduction
Violence against staff poses a risk to healthcare workers internationally (Grant et al., 2022; Hawkins & Ghaziri, 2022). Specifically, research showed consistently high rates of occurrence of violent incidents in emergency department (ED) and psychiatric department (PD) (Liu et al., 2019; Stowell et al., 2016; Vento et al., 2020). A recent meta-analysis showed that 77% to 84% of ED and 67% of PD healthcare workers experience workplace violence (Aljohani et al., 2021; Liu et al., 2019). Violence against healthcare workers increased over the recent decade (Nikathil et al., 2018; O’Brien et al., 2024), but particularly during the COVID-19 pandemic (McGuire et al., 2022; Odes et al., 2023; Ramzi et al., 2022). This increase in violence against healthcare workers has brought the issue of safe work environments to the forefront (Joyce et al., 2023).
Screening for violence risk is an important part of the strategy to prevent violence against staff. To improve the risk of violence screening, several structured instruments have been developed and validated (Cabilan & Johnston, 2019; Fricke et al., 2023). Most available risk assessment tools were first developed in either psychiatric (Hvidhjelm et al., 2023) or emergency (Cabilan et al., 2023) settings, but later validated in the other setting (Dugré et al., 2019). There were many attempts to provide evidence-based guidance on using risk assessment tools in healthcare. Earlier reviews were often focused on a specific healthcare setting (D’Ettorre et al., 2020; D’Ettorre & Pellicani, 2017; Hamrick et al., 2023; Lorettu et al., 2020; Mento et al., 2020; Ogonah et al., 2023; Sammut et al., 2023), tool (O’Shea & Dickens, 2014; O’Shea et al., 2013), or across general healthcare settings (Fricke et al., 2023; Ghosh et al., 2019). Given that the ED and PD are the two settings that have the highest rate of workplace violence, and organizations are seeking improved safety (Goldstein, 2022), consistency in the use of the assessment tools for risk of violence will be beneficial. Such an approach can enable (a) consistent observation of patients and (b) their (re)assessment if they transfer to a new setting (e.g., from ED to PD), which can be useful for early identification and prevention (ACSQHC, 2018; Holland et al., 2021; Munich et al., 2021).
This review identified evidence-based tools that may have practical utility in these environments. The present study reports the evidence on the predictive validity of risk assessment tools used to screen for violence and aggression risk toward healthcare workers in emergency and psychiatric departments. Our interpretation focuses on tools with the most reliable evidence of predictive performance.
Methods
Design
We conducted a systematic review, following the Cochrane methodology (Higgins et al., 2024) and report our findings according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) (Page et al., 2021). The protocol of the review was prospectively published at the Open Science Framework (https://osf.io/gw7an/).
Eligibility Criteria
Eligible studies were (a) primary research published in a peer-reviewed journal; (b) evaluations of the predictive validity or performance of instruments developed to identify the risk of violence against healthcare workers; and (c) tested in an emergency or psychiatric department. We excluded qualitative studies and studies that were conducted in forensic and outpatient settings, or when the outcome was assessed post-discharge (e.g., violence in the community).
Search Strategy
Four databases (Medline, PsycINFO, Embase, and the Cochrane Database of Systematic Reviews) were searched separately from inception until July 18, 2023. No limits, restrictions (including language), or search filters were used (see Supplemental materials, Table S1). We also hand-searched the reference lists of relevant systematic reviews and included studies for forward and backward referencing. The results from all searches were imported into Covidence, and duplicates were removed. Three authors (S. K., S. R. E. D., M. L.) conducted all screening, quality assessment, and data extraction. Two authors independently reviewed the titles and abstracts of all search results, and then the full text of relevant articles to determine eligibility. Consensus or discussion with a third reviewer was used if the two reviewers did not agree on eligibility.
Risk of Bias and Quality Assessment
All eligible studies were independently assessed by two authors for Risk of Bias (RoB) using the Prediction model Risk of Bias Assessment Tool (PROBAST) (Wolff et al., 2019). PROBAST is designed to assess the methodological quality of studies investigating diagnostic or prognostic accuracy, providing an estimation of RoB for each study, with low, unclear, or high RoB and concern (applicability) categorizations. PROBAST consists of four domains (participants, predictors, outcomes, and analysis), containing 20 signaling questions for RoB assessment. In addition, applicability concerns are deemed “high” when the population, predictors, or outcomes of the study do not match those in the review question.
Following the PROBAST tool workflow (Moons et al., 2019), evidence of external validity (applying the conclusions of a scientific study outside the context of that study) was prioritized when controlling for RoB. Consensus or discussion with a third reviewer was used if the two reviewers did not agree on RoB scoring.
Data Extraction
Two authors (S. K., S. R. E. D., or M. L.) independently completed data extraction. All inconsistencies were discussed and resolved via online meetings. All data were extracted and placed in Excel sheets (Microsoft Corp., Redmond, WA, USA) developed by the authors. Extracted data included information on the prediction tool and outcome, country, data collection process, sample characteristics, inclusion/exclusion criteria, diagnosis, previously reported violence, predictive validity (performance measures), and reliability (internal consistency measures using kappa and Cronbach’s alpha). Predictive validity data included sensitivity and specificity, positive and negative predictive values, area under the receiver operating characteristic curve (AUC), and odds ratios (the odds of a violent individual being identified as violent using the tool). When considering AUC, we used the following categorizations: ≤ 0.69 poor, 0.70 to 0.79 acceptable, 0.80 to 0.89 excellent, and 0.90 to 1.00 outstanding (Muller et al., 2005). The agreement between observed outcomes and predictions refers to calibration, demonstrated via graphical assessment (calibration plot or slope), the Hosmer–Lemeshow goodness-of-fit test, calibration-in-the-large (Hilden et al., 1978; Steyerberg et al., 2010). Overall performance measures quantify the distance between the predicted outcome and actual outcome, or the variance in the data that the model can explain (R2, Brier score) (Steyerberg et al., 2010). This is usually presented in percentages ranging from 0% to 100%, where a value of 0% means that a model does not explain any of the variance in the data, and 100% indicates all of the variance.
Results
Search Outcome
Our search identified 2205 titles, from which we identified 62 eligible studies (Figure 1), 60 were published in English, one in Hungarian, and one in Polish. Most studies were conducted in psychiatric departments or wards (n = 58), four were in ED. Most studies used violent or aggressive incidents as the outcome (n = 53), nine studies used other outcomes to operationalize violence, including seclusion (n = 5), absconding (n = 4), physical restraint (n = 2), and restraint by administration of intramuscular sedative medication (n = 1). Several studies used multiple measures of outcome (n = 6). Studies were conducted in 22 countries. The majority of the studies were longitudinal (n = 54), seven studies were cross-sectional, and one was unclear.

PRISMA diagram: Predictive validity of violence screening tools in emergency and psychiatric services: A systematic review.
Identified Violence Risk Assessment Tools
Table S1 presents 33 risk assessment tools reported in the 62 included studies (Supplemental Material). The majority were developed in psychiatric settings (n = 28). Seven tools were developed in correctional psychiatric settings and then adopted for general public application: Violence Risk Screening-10 (V-RISK-10), Structured Assessment of Protective Factors for violence risk, Historical Clinical Risk Management-20 (HCR-20), Novaco Anger Scale (NAS), Short-Term Assessment of Risk and Treatability (START), Measures of Criminal Attitudes and Associates Attitudes Towards Violence scale, Psychopathy Checklist-Revised (PCL-R). There were two instruments developed in ED settings, Queensland Occupational Violence Patient Risk Assessment Tool (QOVPRAO; Cabilan et al., 2022) and Brief Rating of Aggression by Children and Adolescents( BRACHA; Barzman et al., 2013), the latter was specifically developed in ED for the assessment of pediatric psychiatric admissions. Finally, the Aggressive Behaviour Risk Assessment Tool (ABRAT) was developed for other settings (medical and surgical units) but then validated in the ED (Kim et al., 2012, 2022). Four studies indicated a mixed population of children/adolescents and adults with age ranges: 13 to 77 years (Marques et al., 2015), 9 to 63 years (Miller et al., 2016), 16 to 92 years (Pujol et al., 2020), and 12 to 55 years (Yuniati et al., 2020). Five studies were focused specifically on children or adolescent populations (Barzman et al., 2013; De Beuf et al., 2023; Dutch & Patil, 2019; Roaldset et al., 2023; Stafford & Cornell, 2003) (see Table S1). Out of 62 studies, eight did not report any information about the age of the participants.
Quality of Included Studies
There were no studies that were low RoB with low applicability concerns (see Table S2 for quality assessment results). Figure 2 presents the summary of the quality assessment for RoB and acceptability; most studies were classified as high risk or unclear. Given no studies were classified for overall low RoB to receiving “yes” or “probably yes” across all four domains, we prioritized studies receiving “yes” or “probably yes” for at least three out of four RoB domains and at least two domains out of three for applicability. Ten studies met these quality assessment criteria and are described further (Table 1).

Graphical presentation of quality assessment: Risk of bias and applicability.
Risk of Bias and Applicability: Included Studies Only.
Note. RoB = Risk of Bias; + (green) indicates low RoB/low concern, - (red) indicates high RoB/high concern, ? (yellow) indicates unclear RoB/unclear concern about applicability.
Data Synthesis
Table 2 presents the participant and study characteristics of the ten studies included following the quality assessment outlined above. All of them were conducted in mental health settings, with two targeting the child and adolescent population. The outcome in the ten included studies was assessed using the following four instruments: five studies used the Overt Aggression Scale (OAS), three the START Outcome Scale (SOS), the Staff Observation Scale Revised (SOAS-R), and the Report for Aggressive Episodes (REFA). Finally, one study used DASA-YV, an instrument developed to assess the likelihood of imminent aggression (within the next 24 hours) that contains both subscales for risk assessment and observed outcomes, allowing for day-to-day evaluation of aggression risk.
Population and Study Characteristics: Included Studies.
Note. NI = No information; START = Short-Term Assessment of Risk and Treatability.
From the ten highest-quality studies, there were eight screening tools used; their estimates of predictive validity and reliability are presented in Table 3. Seven studies reported reliability measures, ranging from 0.74 to 1. Four instruments (HCR-20, V-RISK-10, DASA-YV, BVC-CH) showed acceptable to outstanding predictive performance for violence using suitable performance measures, while one (PCL-R) had acceptable sensitivity, specificity, negative and positive prediction values. Evidence for DASA-YV and PCL-R consisted of adolescent and youth participants (Dutch & Patil, 2019; Stafford & Cornell, 2003). None of the studies clearly presented calibration measures that report agreement between expected and observed probabilities. The characteristics of these tools are presented in Table 4.
Tool Characteristics of Included Studies.
Note. Sensitivity (SN) = proportion of people who were violent and were correctly predicted to be violent; Specificity (SP) = proportion of people who were nonviolent and were correctly predicted to be nonviolent; FPR = false-positive rate (probability that a person who is predicted to be violent will be nonviolent); PPV = positive predictive power (accuracy of predictions that individuals will be violent); NPV = negative predictive power (accuracy of predictions that individuals will not be violent); OR = odds ratio (the odds of a person being violent who scored above the cutoff score are X times the odds of a person who scored below the cutoff score); NI = No information.
Risk of Violence Assessment Tools in Emergency and Mental Health Settings: Selected on Quality Assessment.
Includes START:SV, an adolescent version.
V-RISK-10 modifications were included (V-RISK-Y, V-RISK-EXT).
Although four of 62 eligible studies used instruments developed and validated in ED (BRACHA, QOVPRAO, ABRAT, PANSS) (see Table S2), none of them met the quality assessment criteria to be included in the final selection. DASA is the only instrument from the high-quality studies included in the review that was also used in ED settings previously. Moreover, from the two included studies that had adolescent participants, this scale showed a good predictive validity and can be advised for application in this population (Dutch & Patil, 2019; see Tables 2 and 3).
Importantly, in our final selection of ten studies, two studies targeted adolescents and youth with median/mean age of 14 years (Dutch & Patil, 2019; Stafford & Cornell, 2003), while the remaining eight recruited adults of mean age 34 to 47 years. This may reflect the higher interest in the general population. Furthermore, six studies included a balanced representation of males and females, ranging between 50% and 55% of females, while three had a larger majority of males (21%–27% females) (Braithwaite et al., 2010; Marriott et al., 2017; O’Shea & Dickens, 2015) and only one study included 66.7% females (Dutch & Patil, 2019). We note that this representation was not linked to social identity but rather biological sex. Included studies did not report whether tool performance varied with sex, though according to the evidence, males were more inclined to commit violent or aggressive acts (especially physical) (e.g.,Gillespie et al., 2010; Nelson et al., 2024).
Included studies rarely presented information on the ethnicity or socioeconomic status of participants, which enabled the assessment of whether implicit biases exist within the tools that affect the performance of the tools based on these factors. Only one study included socioeconomic status in the prediction of aggression (Stafford & Cornell, 2003) and found it an insignificant predictor. Other studies recorded the ethnicity of the participants presented in baseline characteristics but did not include these variables in analyses (Braithwaite et al., 2010; Jalil et al., 2019; Marriott et al., 2017). This factor was not mentioned in most studies, while sometimes the findings related to it were redacted from Results and Discussion (O’Shea & Dickens, 2015). This is a significant limitation of the included research, given that ethnicity and socioeconomic status are shown to play a significant role in workplace violence in healthcare settings (Fujii et al., 2005; Sikstrom et al., 2023).
Discussion
This systematic review reports the predictive performance of risk assessment tools for violence in emergency and psychiatric settings. We identified 62 studies that reported the predictive validity of 33 risk assessment tools. Our results focus on the 10 studies of 8 tools for which there is higher quality evidence regarding their performance; all studies were conducted in psychiatric settings.
Of the eight tools for which there is highest-quality evidence, five (HCR-20, V-RISK-10/EXT, DASA-YV, BVC-CH, PCL-R) showed acceptable to outstanding predictive validity using AUC and other performance measures. All these tools were externally validated. Unfortunately, none of the studies presented sufficient information about calibration. Though none of the included studies were conducted in ED, one of the tools used in these studies —DASA—was also validated for use in ED settings. However, this study utilized a juvenile version of DASA and included adolescents.
The findings show a lack of high-quality studies, which may reflect difficulty in conducting such studies in EDs and PDs. We did not find any studies that met the highest standards of quality; most were classified as high or unclear risk. This is consistent with related reviews (Sammut et al., 2023). Furthermore, previous research often included forensic populations, which are inherently different to the general population (Dickens et al., 2020; Ogonah et al., 2023; Ramesh et al., 2018). This is the first systematic review to examine the predictive validity of violence screening tools on general populations in EDs and PDs.
Violence can be operationalized in many ways, which is reflected in the design of risk assessment tools as well as assessment of outcomes. Some discriminate between different types of violence, including verbal aggression, physical aggression against objects, physical aggression against self, and physical aggression against others (AOS; Yudofsky et al., 1986), or include scales for self-harm, suicide, substance use, victimization, absconding, and self-neglect (SOS; O’Shea & Dickens, 2015). Variability in the outcomes assessment in the included studies increased the heterogeneity of the findings. We attempted to overcome this by operationalizing our outcome as physical violence or any violence. We found that a more specific operationalization of the outcome enables better predictive validity (see Table 5). Indeed, “physical violence” showed higher predictive accuracy compared to “any violence” in START, HCR-20, V-RISK-10 in conjunction with either OAS or REFA (Jalil et al., 2019; Roaldset et al., 2012). Likewise, studies that used SOAR-S, a tool designed to measure the severity of physical assaults, showed better predictive validity than those non-specified violence (Eriksen et al., 2018; Rechenmacher et al., 2014).
Summary of the Review Findings and Their Implications for Practice, Policy and Research.
We found that DASA-YV and BVC-CH showed the highest predictive validity compared to other instruments, where the former used the original outcomes, measured as a subscale of the tool, and the latter used the severity of physical assaults approach of SOAS-R (Dutch & Patil, 2019; Rechenmacher et al., 2014). Next, HCR-20 (Clinical Scale, a total score) and V-RISK-10-EXP showed excellent predictive validity for physical violence, whereas their performance was acceptable for any violence (Eriksen et al., 2018; Jalil et al., 2019; Sada et al., 2016).
The tool tested most often in included studies, START, had poorer predictive ability for any aggression, and physical aggression using SOS and AOS (Braithwaite et al., 2010; Jalil et al., 2019; Marriott et al., 2017; O’Shea & Dickens, 2015). PCL-R showed poor to acceptable predictive validity for both violence and physical violence (Jalil et al., 2019; Roaldset et al., 2012; Stafford & Cornell, 2003). The only instrument that measured unauthorized leaving (of the facility) was SOS, and its predictive ability was poor (Braithwaite et al., 2010; O’Shea & Dickens, 2015).
Our findings on predictive validity are consistent with earlier studies. A recent meta-analysis of the original BVC estimated a pooled area under the curve at 0.83 (95% CI [0.78, 0.87]) in a subset of 15 studies (Hvidhjelm et al., 2023). In line with our results, Kös et al. (2024) found that though both BVC and V-RISK-10 demonstrated excellent to outstanding prediction. The former had stronger predictive validity for short-term violence, while the latter was more predictive of long-term violence. Earlier tool-specific systematic research and meta-analysis indicated that the Clinical and HCR-20 total scores were good predictors of physical aggression, although their accuracy varied across different patient profiles (e.g., inpatient women) (O’Shea et al., 2013; Rossdale et al., 2020).
Our findings concluded that START, although being a popular tool, consistently showed poor predictive performance. In contrast, O’Shea and Dickens (2014) reported in their systematic review focused on START, its risk estimates demonstrated strong predictive validity for various aggressive outcomes. However, it can be argued that the study raises some methodological concerns, given that no standardized quality assessment tool for predictive models (e.g., PROBAST) was used and the authors did not adjust the findings by the quality of the studies.
Implications
The findings from this review have implications for research, policy, and practice (see Table 5). We recommend the DASA and BVC as short-term structured risk assessment tools with the best predictive validity based on the higher quality evidence. Of the two instruments, DASA has been used in PD and ED settings, so it is likely applicable to both contexts. However, there is a lack of quality studies to support its use in the adult population, so our recommendation applies to adolescents only. Most importantly, we recommend that the selection of a risk assessment tool should consider the following: (a) the target population (age, clinical factors), (b) the violence operationalization, and (c) the purpose of the monitoring (horizon of prediction—long term vs short term, evaluation frequency). As a measure of long-term risk of violence, we suggested V-RISK-10 as it was highly effective for less frequent assessment. Our findings also indicate that predictive performance is related to the specificity of the outcome; researchers should clearly define how violence is operationalized and provide justification for their definition.
The use of risk assessment tools by itself will likely not prevent or reduce violence; risk assessment needs to be part of an integrated strategy including staff training in terms of tool administration and clear risk management guidelines (Viljoen et al., 2018). We encourage workplaces to combine the use of the tool with appropriate staff training and education, as well as developing and implement the workforce values that support staff safety. Moreover, violence risk assessment requires staff training and climate to be implemented in combination with strategies that can effectively address and reduce potential or future harm (Florisse & Delespaul, 2020; Wand, 2012).
Limitations and Future Directions
This study has several strengths. An advantage of this study is its breadth and methodological rigor. Our searches identified a very large number of tools; however, we focused our analysis on those with the strongest supporting evidence. For quality assessment, this study adopted PROBAST, a robust RoB tool not yet used in the present application in healthcare settings. Moreover, the findings include application beyond a single setting, which can assist consistency in practice. However, a limited number of studies were of good quality based on the PROBAST tool, and none of the included studies contained calibration measures; other studies had issues with poor descriptions of research methodology, in particular study design. There are many measures and operationalizations of violence with varying specifications of violence toward the staff. This led to heterogeneity and complicated the synthesis. The small number of quality-based included studies and heterogeneity prevented us from conducting a meta-analysis.
Our study showed that studies in the field do not measure the calibration of prediction tools. The quality of information to guide decisions about the implementation of these tools in practice would be greatly improved by adding this methodological step. Therefore, future studies focused on predictive validity in the settings should adopt a higher level of methodological precision in research design, research coordination, analysis, and reporting to produce transparent and high-quality outputs. Future research should also record the ethnicity and socioeconomic status of participants and examine these factors as a predictor in the analysis, given that factors like ethnic background and socioeconomic status have been reported to cause bias in the perception of patient aggression (Fujii et al., 2005; Sikstrom et al., 2023).
Future measuring risk of violence needs to accurately capture the root of the risk without circularity (past-future behaviors) in reasoning (Mistler et al., 2024). This approach can improve the accuracy of prediction, usability, and interpretability of the assessment. In terms of future directions, recent studies have also incorporated technical developments using Artificial Intelligence methods such as machine learning. These include examining health records to predict aggression in psychiatric inpatients and have reported promising results (Tay et al., 2022). These methods may be particularly useful in settings with high patient volumes and reliable data capture systems.
Supplemental Material
sj-docx-1-tva-10.1177_15248380251358224 – Supplemental material for Predictive Validity of Violence Screening Tools in Emergency and Psychiatric Services: A Systematic Review
Supplemental material, sj-docx-1-tva-10.1177_15248380251358224 for Predictive Validity of Violence Screening Tools in Emergency and Psychiatric Services: A Systematic Review by Sviatlana Kamarova, Simon R. E. Davidson, Christopher M. Williams, Mariana Leite and Steven J. Kamper in Trauma, Violence, & Abuse
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Ethical approval and informed consent statements
Not applicable.
Data availability statement
Not applicable.
Supplemental Material
Supplemental material for this article is available online.
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
