Imperfect Tools: A Research Note on Developing,Applying,and Increasing Understanding of Criminal Justice Risk Assessments

Abstract

This article shares considerations for designing, implementing, and understanding risk assessments used to reduce recidivism of people under community supervision. These insights are gleaned from 27 data scientists who participated in focus groups during the National Institute of Justice’s Recidivism Challenge Winners Symposium. Analyses revealed three primary themes: design considerations, implementation, and increasing awareness and understanding of risk assessments. Critical aspects of the design phase include validating the tool, incorporating field data that account for real-time changes, and adopting strategies to address false positives/negatives and the model’s complexity. Upon the tool’s development, practitioners are recommended to devise an implementation plan, balance attention to risk with client-focused needs, and exercise modest discretion while considering algorithmic results. Recognizing the value predictive instruments bring to decision-making and identifying their limitations is needed to increase understanding for all stakeholders. Collaboration and dialogue between tool developers and practitioners are crucial at every stage.

Keywords

risk assessments machine learning design implementation

Introduction

Despite years of research, many important questions remain regarding risk assessment development, use, and the communication of results to involved parties. Actuarial risk assessments are imperfect tools meant to inform but not dictate appropriate responses. An important area of inquiry is how to help criminal justice stakeholders use risk assessments while avoiding causing harm to those subjected to their results. This article seeks to move the discussion forward by drawing on conversations from the National Institute of Justice’s (NIJ) Recidivism Forecast Challenge Winners Symposium and examining recent literature in this area.

Actuarial risk assessments are commonly used across the criminal justice continuum, from pretrial detainment to postincarceration community supervision. Within the criminal justice system, risk assessments predict the likelihood that someone involved with the system will engage in criminal activity in the future. These tools affect who is incarcerated, how agencies spend their funding (e.g., service development and delivery), and how practitioners approach case management (Werth, 2019). While the use of risk assessments in criminal justice is far from new, the frequency of use, methods used to develop them, and the diversity of applications have grown rapidly over the last few decades. Concerns regarding accuracy, ethical and legal implications, and the potential of increasing racial and gender disparities have accompanied this practice. Technological advancements have facilitated applying more sophisticated methods (e.g., machine learning) to produce risk projections, but the aforementioned concerns remain if they are not specifically addressed. Agencies must determine what process and information will go into developing these tools, how to implement them best, and appropriate means to effectively communicate their use to stakeholders and the public. Users of these tools are also responsible for demonstrating their utility while being transparent regarding their limitations.

Risk assessments, including ones informed through machine learning, need to be accurate, transparent in their procedures, and responsive to possible errors. Formal guidelines to achieve these goals are often lacking (Berk & Hyatt, 2015). This can create challenges for criminal justice practitioners and administrators who are under increasing pressure to comply with mandates requiring these tools, even while the number of risk assessment tools to choose from continues to grow (Desmarais, 2017). These tools vary in complexity and the rigor with which they have been tested. Practitioners and administrators must also contend with the lack of a one-size-fits-all approach or “best” tool and a paucity of knowledge to guide effective implementation (Desmarais, 2017; Desmarais et al., 2016). Scholars have noted the lack of research examining the implementation of risk assessments in the criminal justice field, the absence of regulations on how to use them, and the lack of review and approval by researchers and legal bodies (Garrett & Monahan, 2019, 2020; Starr, 2015).

Proponents of risk assessments view them as a means to improve the criminal justice system’s effectiveness, account for potential biases, reduce system involvement for low-risk individuals, and decrease the number of people incarcerated. Critics warn risk assessments may legitimize mass incarceration and further disadvantage individuals overrepresented in the criminal justice system (Werth, 2019). This debate is further complicated given that only some entities’ procedures and algorithms are made public, while others are kept proprietary, limiting stakeholders’ abilities to know all that goes into producing a risk classification. In addition, the wide range of developers, including for-profit, nonprofit, academic, and correctional departments, can make it difficult to reach a consensus on best practices (Werth, 2019).

NIJ Recidivism Forecasting Challenge Winners Symposium

In Spring 2021, NIJ (n.d.) held the Recidivism Forecasting Challenge. The Challenge was nationally advertised, and those interested in participating in the data forecasting competition could compete individually or as a team. The primary aims of the Challenge were to increase public safety and the fair administration of justice by advancing the field’s ability to forecast and understand factors related to recidivism. Using data sets provided by the state of Georgia, contestants were tasked with building models predicting if a person on parole would recidivate over 3 years. A total of US$723,000 in prizes were given to entries with the most accurate forecasts of recidivism for males versus females, overall accuracy for gender, and fairness and accuracy (which accounted for false-positive rates by race). Results from the Challenge have previously been released (NIJ, 2021; White et al., 2022).

Upon completion of the Challenge, the winners provided a written report describing their methods and strategies used in the competition. These reports are available on NIJ’s website.¹ A review of the winning reports (White et al., 2023) and summary of the Challenge results (Hudgins et al., 2022) have also been completed. The winning contestants were also invited to attend a 2-day virtual Winners Symposium in December 2021. Day 1 of the symposium involved the discussion of themes around race and gender bias in risk assessments, including eliminating bias, improving fairness in risk assessments, and gender-specific needs. Highlights from Day 1 are provided in this article (see “Acknowledging Bias” section); however, a more in-depth discussion of these topics can be found in Rief and colleagues (2023). This research note aims to describe and paraphrase the data scientists’ discourse on Day 2 of the symposium, which focused on practical implications, and situate this discussion within the broader literature on risk assessment.

Method

All Challenge winners were invited to attend the 2-day Winners Symposium. On the second day of the symposium, 27 of the 56 Challenge winners participated in a series of focus groups. These discussions were held virtually in three Webex breakout rooms across two 45-min sessions. Each breakout room included a group of Challenge winners, one NIJ facilitator, and one professional notetaker. In Session 1, participants were asked to discuss how to apply the results gained from the Challenge into practice. An example of a question posed during this forum was, “How can we make this type of work [development of risk forecasting models] more beneficial to practice?” Session 2 addressed general data science applications with questions such as “Does a more complex model mean we are moving forward in corrections and being able to predict these important [risk] outcomes?” Following the breakout sessions, all attendees shared their final thoughts in a large group meeting presided over by two NIJ staff members and two notetakers.

We compiled the notes taken during each focus group (seven total) to identify key themes and subthemes. Thematic analysis was performed with a by-hand review of the focus group notes, each of which ranged between four and seven pages in length. This analysis was conducted separately by the first two authors. Authors met consistently to discuss the review process, exchange memos and analytic charts, and compare observations with existing evidence. Upon completing the analysis and reconciling any differences, three major themes were identified about risk assessments: (a) design considerations, (b) implementation, and (c) increasing awareness and understanding of risk assessments among practitioners and the public. We discuss these themes in the following sections (for a summary, see Table A1 in the appendix).

Key Takeaways From Focus Groups

Design Considerations

Focus group participants identified several considerations for agencies and relevant stakeholders during the design phase of a risk assessment tool. These included forging working partnerships, incorporating contextual and real-time information, anticipating errors, and determining model complexity. Recommendations are then made based on the steps used to predict their winning submissions, predominantly employing machine learning techniques, and their professional experiences.

Forging Working Partnerships

Participants emphasized the importance of the implementing organization being an active partner in the design process. By routinely meeting with developers and discussing the goals of the desired tool, users can provide input on design considerations and feedback on field tests. This can help scientists revalidate and improve model performance. These partnerships also may increase the agency’s ability to understand what the instrument does and how best to use the results to inform practice, thereby facilitating the construction of a finer tool (Levin et al., 2016). Having this working relationship can reduce the gap between the developer’s knowledge about what practitioners should be doing with the tool and the practitioners’ actual execution of the tool in their operations.

Incorporating Contextual and Real-Time Information

Beyond the daily use of the tool, participants stressed the importance of understanding the larger system in which the tool will operate, keeping the tool updated to maintain its relevance to an ever-evolving landscape, and evaluating its live functioning. Each of these considerations requires stakeholders to effectively communicate with one another and maintain working partnerships. Through maintaining these relationships and being mindful of the context the tool will be utilized in, the tool’s utility and performance will be increased.

Ritter (2013) offers practical steps to guide stakeholders in developing and implementing machine learning risk assessments. She discussed jurisdictions building their own risk assessment tool and weighing into the development their unique data circumstances, available resources, and the political environment in which the tool will be implemented. The MnSTARR, which was developed for the state of Minnesota Department of Corrections (MnDOC; Duwe & Clark, 2013), is an excellent example of applying these principles. Before its development, MnDOC used the Level of Service/Case Management Inventory (LSI-R) to assess risk and need. When comparing the performance between the two tools, the MnSTARR outperformed the LSI-R, leading MnDOC to switch tools (Duwe, 2021). The Modified Positive Achievement Change Tool (M-PACT), developed for Washington State (Hamilton et al., 2019), has also proven to be effective in improving performance when tailored for the local jurisdiction. Focus group participants echoed similar sentiments, insisting that creating one grand tool that works seamlessly across different domains is infeasible. Instead, developers should customize tools according to specific jurisdiction and population needs.

Focus group participants repeatedly emphasized the necessity of ongoing validation, recognizing that revalidation of the tool must coalesce with the implementing agency’s capacity to do so. If agencies can, they are urged to make validation of the risk instrument a priority by having practitioners report how well the models apply to the communities they serve. Relatedly, participants suggested training models on multiple data sources, calibrating them for the locality in which the tool will be implemented. Berk and Hyatt (2015) advised that the same predictors and outcomes used in the training data (i.e., the data used to build the model) also be used when evaluating the test data (i.e., the data used to validate the model or represent real-world application).

Participants also noted risk assessments that account for change and incorporate real-time data might be more accurate. Models cannot indefinitely work as originally designed. Although often treated as a static factor (i.e., it does not change), risk can fluctuate over time (Desmarais, 2017). The corrections data used to construct the model may also change based on new inputs and developments in the field. Therefore, reaching saturation is not a practical goal. The participants expressed unanimous support for constructing models using dynamic factors, as opposed to static factors, with time-stamped variables so scientists can know the circumstances and events increasing or decreasing risk. For example, one participant suggested incorporating information obtained during parole visits such as employment information. In a review of the winning teams’ reports, it appeared that dynamic variables (e.g., percent of time employed once released and percent of positive drug tests) were more predictive than prior felony arrests and prior gang affiliation variables (White et al., 2023). The broader literature also highlights the limitations of static factors. Static factors have little utility beyond screening or general categorization (Latessa & Lovins, 2010), likely overpredict risk (Berk & Hyatt, 2015; Hamilton, 2015), and do not account for real-time behavioral changes (Goodley et al., 2021). Research also suggests that dynamic and static risk factors may vary in their ability to predict recidivism when comparing White and Black individuals (Miller et al., 2022).

Anticipating Errors and Determining Model Complexity

In her review, Ritter (2013) further underscored the need to determine the cost of false positives (e.g., predicting someone will recidivate when they do not) and false negatives (e.g., predicting someone will not recidivate when they do) and establish what are acceptable distributions of these errors before development. Depending on which error the agency or developer determines is more critical to avoid will impact the tool’s design. The focus group participants maintained there are costs associated with both error types, and preemptive measures should be taken to address them. Traditionally, developers will overweigh false negatives to reduce their occurrence (Hermstrüwer, 2019; Latessa & Lovins, 2010). Participants argued greater efforts should also be devoted to lowering false-positive rates. Scrutinizing the model’s data and quantifying false-positive and false-negative results regarding their severity and risk to society were also recommended.

Discussion also ensued about whether scientists should make predictions with simple or complex models. Though one might expect the superior performance of complex models, one participant challenged that chasing greater accuracy at the fourth to sixth decimal place (a necessity to win this competition) provided little to no practical value. Participants acknowledged that simpler models might be less accurate but are easier to explain and employ and may more clearly identify the factors most associated with recidivism. Still, a balance is likely required to provide an accurate tool within the agency’s means to develop and interpret, as reflected in the caution that models be “simpler but not too simple” (Hermstrüwer, 2019, p. 208).

In a recent study, Kigerl and colleagues (2022) tested several methods of predicting recidivism. Results suggested the sample size was more important than the type of algorithm, with an optimal range falling between 5,000 and 10,000, while regression-based methods performed nearly as well, providing greater transparency and ease of use. The decision to select simple or complex models is critical, as the trade-off between accuracy and utility has major implications for practitioners and tool developers. Determining at which point, if any, marginal gains in accuracy detract from usability and implementation is a significant question in need of further exploration. An accompanying question is whether resources should begin to shift away from developing increasingly accurate predictions and instead turn to examining the implementation process.

Implementation

The research on implementing and evaluating risk assessments is relatively sparse and suggestions directing implementation efforts are rare (although see Garrett & Monahan, 2020; Ritter, 2013). Herein, we present the advice of the Challenge participants regarding ways to facilitate the integration of risk assessments into criminal justice processes. Recommendations included clarifying goals at the onset, attending to risk and need, and managing practitioner (non)compliance.

Clarifying Goals at the Outset

As noted earlier, collaboration between tool developers and practitioners is critical for enhancing comprehension and developing risk instruments. This partnership remains essential during preparation for implementation or the initial implementation stage. Participants stressed the necessity for developers and practitioners to align their goals for using risk assessments at the outset of implementation. Each party needs to understand how the other intends to employ the tool and reach a consensus on its application.

Before the risk instrument is put into practice, scientists and practitioners should devise an implementation plan for targeted use and analysis of the results, allowing room for adjustments based on organizational needs (Levin et al., 2016). Scientists can facilitate the plan’s fruition by ensuring decision-makers understand the prediction results. Models should also be subjected to an extensive review process (e.g., by institutional review boards) before implementation. This review can serve as a safeguard that the models work as intended. When the models are integrated into criminal justice procedures without clarification of goals or critical review, decision-makers might distrust the tool and return to prior practices. Such models could also have devastating consequences for individuals and communities deemed “high risk” (Berk, 2017; Garrett & Monahan, 2020; Werth, 2019).

Attending to Risk and Need

Beyond articulating objectives, participants described the challenge of translating risk assessment results into actionable responses (i.e., what practitioners should do on the ground). Implementing organizations need more than quantified information on who is low or high risk when they employ risk instruments in their decision-making. They need guidance on how to apply the information to various decisions based on the delivered output.

Risk instruments can do more than identify who is at risk of recidivating: They can help direct resources according to individuals’ needs. Participants and the literature suggest practitioners using predictive technologies should focus more on the criminogenic needs and protective factors moderating risk (Barnes-Lee, 2020; Nelson & Vincent, 2018; Wormith, 2017). If practitioners only attend to risk, they may miss critical aspects of individuals’ characteristics and experiences contributing to their recidivism (Barnes-Lee, 2020). Machine learning, with its adaptability, has the potential to incorporate live and updated data to reflect individuals’ needs. For example, while models are ordinarily constructed using aggregate or geographic data, like Census tracts, participants argued having models incorporate micro-level data could yield more robust information. They cautioned that a macro-only focus could trigger heightened crime control strategies such as hyper-surveillance and policing in high-risk populations/communities. The best illustration of this issue comes from Berk (2009), who discovered greater racial bias when using residential zip code variables to forecast homicides committed or attempted by those under correctional supervision. Based on the results, which indicated an elevated rate of failure for Black individuals compared with White individuals on probation and parole, he determined that, even if race is not factored into a model, aggregate information such as the zip codes where individuals are released can be used as a proxy for race, particularly in highly segregated areas. Seeing the potential for unintended discriminatory responses, Philadelphia’s Adult Parole and Probation Department removed the zip code variables from its risk-forecasting tool, which previously used them for several years (Popp, 2017).

It is important to remember variation exists among supervised individuals, which makes a bird’s eye view of prediction results insufficient. Participants suggested scientists explore a micro-level approach that considers individuals’ life trajectories, residence, and expectations about recidivism to obtain a fuller view of individual protective and risk factors. From there, corrections agencies and officers could use this insight to guide personalized interventions. Such action is necessary as evidence suggests individuals at different risk levels sometimes receive improper services, which has implications for recidivism (Nelson & Vincent, 2018).

Managing Practitioner (Non)Compliance

Although practitioner use of risk assessments is understudied, available research finds they do not always use the tools as intended. Garrett and Monahan (2020) found high variance in how judges perceived and used the Nonviolent Risk Assessment Instrument; some used the instrument as an aid they could sometimes override; others felt uncomfortable using it altogether. An NIJ-funded study by Berk (2017) revealed similar behavior when evaluating the effect of machine learning forecasts on parole release decisions. He observed that while the forecasts lead to more accurate decisions based on recidivism risk for specific individuals, parole board members still largely reverted to prior procedures that invoked their discretionary powers. This tendency to ignore the risk instrument output in daily operations, known as foot-dragging, may be due to practitioners’ fears of deskilling (Brayne & Christin, 2021).

Focus group participants suggested practitioners need to be reassured that risk instruments aid, rather than determine, decision-making. These instruments do not eliminate but aim to better guide discretion (Brayne & Christin, 2021). Participants acknowledged practitioners might need to exercise more discretion in some situations, despite the use of risk assessments. Although risk assessments have an established degree of accuracy (Berk, 2017; Garb & Wood, 2019; Ghasemi et al., 2021), they are imperfect because individuals may not always align with their risk profiles. The predictive accuracy of risk models can also vary depending on factors such as the purpose of application (Fazel et al., 2012) and the data used to build them (Tonry, 2019). With evidence suggesting algorithmic risk assessments have higher validity than clinical or subjective judgments, there is a need to strike a balance in which practitioners neither overestimate nor underestimate risk classification scores (Berk & Hyatt, 2015; Garb & Wood, 2019). Practitioners should incorporate risk assessments into their toolbox while using discretion sparingly to maximize the best interests of supervised individuals.

To encourage the adoption of the tools and reduce misapplication, participants recommended continuous monitoring of practitioners’ use of risk assessment. Such evaluation can occur via judicial review, managerial supervision, and scientific review by researchers (Brayne & Christin, 2021; Garrett & Monahan, 2020). Practitioners also require training and guidelines to improve the tool’s implementation while keeping in mind their agency resources, which are needed to fulfill compliance (Viljoen et al., 2018). Finally, some participants proposed that if funding is obtained, police departments and corrections facilities could develop an interface that continuously updates in real-time with new information about individuals under supervision to alleviate the burden practitioners might otherwise feel when dealing with complex models.

By following these suggestions, risk assessments might become more dependable actuarial assistants in criminal justice decision-making. Nevertheless, achieving broad implementation of these instruments requires increasing understanding of how they function. When all parties involved are properly informed about the tools’ results, the tools may better fulfill their stated purpose.

Increasing Awareness and Understanding

Participants emphasized the need for practitioners, policymakers, and the public to better understand risk assessments’ development, their intended purpose, and how they support the decision-making process. Selbst and Barocas (2018) discussed the value of people being able to interpret what the result means, knowing what needs to occur to achieve a different result, and possessing the ability to debate the validity of the results. Achieving these goals can be a lofty task filled with nuances, trade-offs, and limitations. Despite the difficulties this may hold, working to improve stakeholders’ abilities to communicate these aspects effectively is a goal worth pursuing.

Fostering Understanding, Community Trust, and Transparency

Participants recommended stakeholders recognize predictive tools’ improvement over subjective decision-making while acknowledging their limitations. Participants noted people can either expect too much from algorithms, fueled by overemphasis on their accuracy, or be overly apprehensive and skeptical of the value they offer. It is imperative to increase general awareness and understanding of why the tools are used and what reasonable expectations are regarding their performance. Essential to achieving these tasks is increasing transparency on how results are processed, which can help build public trust in the assessments.

Participants provided several suggestions for increasing transparency and trust. Among them were having agencies and developers provide comprehensible explanations of how the tool works, what the results mean, and openly reporting the predictive performance of the tool (Garrett & Monahan, 2020). One means of achieving these goals is providing the programming code for the algorithm used to create the risk predictions so people know why someone received the risk score they did, along with literature that links the codes to recidivism (Garrett & Monahan, 2020; Popp, 2017). Providing the programming code also assists others in reviewing the methods used to develop the tool (Selbst & Barocas, 2018). Tools built using regression-based methods have an advantage in transparency. They allow practitioners to examine the weight of each factor in the produced prediction (Kigerl et al., 2022) and help avoid the “black box” effect (i.e., the opacity that occurs when you are unsure of how you received the outcome classification based on the inputs) that often accompanies machine learning techniques (Burrell, 2016). While practitioners do not have to understand all the mathematical procedures, they must be able to communicate what is behind the risk prediction score (Hermstrüwer, 2019). This awareness can help them inform people under correctional supervision about what affects the algorithm and how decision-makers will interpret and utilize results. Having this knowledge will improve practitioners’ ability to implement the tool effectively and ethically.

Once the tool is implemented, focus group members stated that organizations should show how applying the tool benefits the person being assessed and the community at large. This is accomplished by the agency using the assessment to determine and provide appropriate amounts of services and supervision to individuals under their jurisdiction. The participants also suggested making the risk instrument’s effect on recidivism and public safety publicly accessible in a transparent and interpretable manner.

Communicating Risk

Another topic participants discussed was the impact of how risk classifications are labeled and communicated. Research finds that practitioners and the public prefer risk information presented categorically (e.g., low, high) rather than numerically (e.g., frequencies, probabilities; Garrett & Monahan, 2020; Krauss et al., 2018). However, risk categories can be misleading, as they do not neatly correspond to fixed percentage likelihoods (e.g., “high risk” correlating to a 90% chance of recidivating), nor are they accompanied by the probability of the behavior occurring. The value of the risk levels/labels may differ depending on the tool in use, the specific target population, and the cutoff points developers choose (Klingele, 2020). Practitioners relying on such categories may overestimate recidivism risk (Green, 2020; Krauss et al., 2018). For decision-makers to effectively use the tool, risk levels and how classifications were operationalized must be clearly communicated (Hu et al., 2021). Participants discussed an alternative to using risk categories: rank ordering those under supervision according to their likelihood of recidivism, and then focusing on a percentile of the highest-risk individuals. Individuals can then be re-ranked based on dynamic, real-time information indicating any changes in their risk profile.

Acknowledging Bias

Another crucial step in building awareness and understanding of these tools is to acknowledge the potential bias in the data and its effect on assessing risk. Equity is a policy issue for which computer science cannot provide all the answers, and it remains important to evaluate and discuss (Berk et al., 2021; Popp, 2017). Focus group participants touched on these topics, noting that bias might already exist in the available data when risk assessments are developed. All parties need to recognize the limitations this can create and the implications for those involved in the criminal justice system. As a result, they recommend developers, when possible, identify the potential source of bias and communicate its presence to stakeholders. Developers should also assess models for racial bias and gender responsivity. A thorough examination of ways to account for bias, race, and gender is beyond the scope of this article but will be addressed in a forthcoming paper that covers Day 1 of the Winners Symposium (Rief et al., 2023). As noted, Day 1 of the symposium featured discussions on bias and gender responsiveness of risk assessments. The forthcoming paper reviews discussion points on challenges stemming from attempts to reduce bias in risk assessment, such as competing definitions of fairness and consideration of gender-specific needs, and how they relate to current research. Challenge participants did not need to address these topics as part of their entry into the competition, but they may have been a part of their approach in forecasting recidivism for males versus females separately, overall accuracy for gender, and fairness and accuracy (accounted for through false-positive rates by race).

Discussion

Risk assessments are a routine part of criminal justice practices and have been for decades. Much has been accomplished in terms of the development of these tools, but questions remain regarding ways to improve their development, use, and perception. In this research note, we offer practical insights on these matters using the knowledge and expertise of data scientists participating in the 2021 NIJ Recidivism Forecasting Challenge Winners Symposium. Through thematic analysis of focus groups, we found three central themes that emerged: (a) designing risk instruments, (b) facilitating implementation, and (c) heightening awareness and understanding.

Participants provided important recommendations to consider when designing an instrument. Collaboration between tool developers and practitioners should occur. Developers should train the instrument on multiple data sources (e.g., those accounting for contextual and real-time information) and conduct repeated testing and validation of the instrument based on user feedback. It is also critical to determine how false positives, false negatives, and model complexity will be addressed.

Participants further underscored the importance of developers and practitioners clarifying how the tool will be implemented. Underestimating and overestimating risk assessments can impair the predictive performance of the instruments and the quantity and quality of resources provided. Practitioners should exercise discretion strategically, devote increased attention to need factors moderating risk, and regularly report on their use of risk assessments.

Participants also conversed about increasing understanding of risk assessments. Practitioners and the public could benefit from knowing the predictive instruments’ value to decision-making, particularly over subjective judgments. Greater care is needed in communicating these tools’ flaws, such as the potential for bias or error. Symposium conversations were also replete with numerous calls for transparency, open access to results, interpretable explanations of results, and actionable reports.

Limitations

We note a few limitations of this work. First, the themes discussed here come from recorded notes rather than direct transcripts. Although the notes capture the main ideas discussed, they may exclude small details. Second, the themes originate from structured focus groups that were 45 min long, resulting in the breadth of topics covered being limited by time. Third, the themes compiled for this article were based on the authors’ interpretations of the notes from the symposium. To reduce interpretation bias, the first two authors cross-checked the identified themes and then shared the results with the larger group of collaborators on the Challenge project for further review. Although not a limitation of the themes reviewed, we must acknowledge the Challenge occurred under different circumstances than those involved in the “real-world” creation of a risk assessment tool. Developers and practitioners likely have the same goal when creating and adopting a risk assessment tool—obtaining the fairest and most accurate risk predictions. We note Challenge participants did not have the same constraints as individuals creating usable risk assessment tools and were primarily not practitioners with experience implementing these tools.

Policy Considerations

The recommendations discussed have several implications for policy. While the participants generally supported using risk assessments in criminal justice practices, they also acknowledged that these tools might experience pushback from practitioners and other stakeholders. Policymakers, agencies, media outlets, and the public may misuse or misinterpret the tools in ways scientists cannot control. One participant cited the case of California, in which a plan to replace cash bail with pretrial risk assessment in 2019 was stopped by voters who feared algorithmic technology would lead to socioeconomic and racial discrimination (see S. bill 10, 2018; McGreevy, 2020). As a result of these opponents’ beliefs about the bias potential in machine learning algorithms, the state returned to its original cash bail system. To avoid outcomes like those in California, participants agreed that transparency is essential to implementing risk assessment tools into policy. When decisions to implement risk assessments are fully transparent to practitioners, the community, and those affected by the tools, concerns about the tools can be addressed early in the implementation process.

Scientists must first relay information about the tools and how they should be employed to policymakers. From there, policymakers need to determine the level of transparency they will require when authorizing or mandating the use of these tools. This includes deciding whether algorithms should be accessible to the public, the level of detail that should be included in tools’ performance reports, and identifying who is accountable for demonstrating the positive effects of these tools on individuals under assessment and the community.

Policymakers must also consider in advance how they want to address the use of certain variables that may contribute to bias, like race, zip code, and criminal history. The approach policymakers take to address bias will depend on their specific goals, as different approaches present trade-offs both technically and practically (see Berk, 2009). If algorithmic results are made public, policymakers should require a description of what potential biases are present and how these often arise from inherent biases in the criminal justice system. Pilot programs involving risk assessments or performance reports could be used to show the public that newly implemented tools work better than previous procedures. No risk assessment will be perfect, and continued work to address biases is still needed. However, these tools have the potential to enable consistent decision-making freer from “human subjectivities, emotions, and prejudices,” which is an important step toward reducing bias in the criminal justice system (Osoba et al., 2019, p. 2). It is also important to adopt techniques for comparing model results with actual results as well. For example, the model output can be compared with parole decisions to evaluate how well populations under assessment match predicted risk levels.

Policymakers and practitioners must also determine how much information is worth collecting in terms of potential return and cost. Recording a tremendous amount of information on individuals may be expensive and unnecessary if that information does not contribute meaningfully to determinations of recidivism risk. The type of information useful may also vary by location, highlighting the importance of creating and validating tools for each locale. Local jurisdictions will have to determine how often their tools should undergo revalidation. Policymakers can assist in ensuring tools are working as intended by establishing training requirements for agencies to utilize and implement these tools and requiring regular evaluations of the tools, as their predictive validity may change overtime as populations change.

Conclusion

Practitioners must understand how to implement risk assessments in ways that positively impact those they work with and the larger community. Much of the work that allows this to occur happens during the tools’ design and initial implementation. It is also vital that all stakeholders continue to increase their knowledge and understanding of these instruments. Part of what needs to be understood, participants contend, is that risk assessments are not solutions but tools for agencies to use to help them make decisions regarding individuals, policies, or procedures. Practitioners should use the information the risk assessments provide, considering the model results, alongside other factors that may affect individuals being assessed that are not included in the algorithms. Ultimately, despite their drawbacks, predictive technology enables data-driven decision-making, which, when used properly, can help people exit the criminal justice system and serve the public interest. As described perfectly by one symposium participant, risk assessments should be part of an ecosystem of solutions and not the sole vehicle for improving the criminal justice system.

Footnotes

Appendix

Table A1.

Summary of Key Takeaways.

Design Considerations	• Partnerships between tool developers and practitioners are needed to facilitate and improve the tool’s design. • Ongoing feedback and validation upon implementation should be considered during the initial design phase. • Tools should be regularly updated according to real-time changes. • A one-size-fits-all approach is infeasible. Developers should customize tools to meet jurisdiction and population needs. • Developers need to anticipate errors and find a balance between the model’s level of detail/accuracy and interoperability/ease of use.
Implementation	• Tool developers and practitioners must align their goals and devise an implementation plan for targeted use and analysis of risk assessment results. • Practitioners should devote increased attention to criminogenic needs and protective factors that moderate risk. • Practitioners must be careful to neither underestimate nor overestimate risk assessment scores. • Practitioners should exercise discretion sparingly and wisely. • Continuously monitor users’ application of risk assessment tools.
Increasing Awareness and Understanding	• Increased transparency of tools is needed. • Predictive preformance and factors included in the assessment should be open access to the public, individuals under supervision, and relevant stakeholders. • Interpretable explanations of how risk classifications are used and affect those under supervision may foster community trust. • Risk categories can be misleading and result in overpredictions. A rank ordering system may communicate risk more clearly and accurately. • All parties must recognize assessments’ limitations and potential biases.

Acknowledgements

The authors would like to acknowledge Angela Moore, Eric Martin, and Joel Hunt of the National Institute of Justice for their guidance and review of the research project and the Recidivism Challenge Winners Symposium participants.

Declaration of Conflicting Interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The work was either partially or fully completed during authors time as Research Assistants at the National Institute of Justice. Opinions or points of view expressed are those of the authors and do not necessarily reflect the official position or policies of the National Institute of Justice or the U.S. Department of Justice.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

D. Michael Applegarth

Raven A. Lewis

Notes

Author Biographies

D. Michael Applegarth, MSW, is a recent participant in the National Institute of Justice Research Assistanship Program and a doctoral candidate in the Social Welfare Department at the University of California Los Angeles. His research interests include the intesection of the criminal legal system and mental health, the reentry process, desistence from crime, and juvinle justice.

Raven A. Lewis, MA is a recent participant in the National Institute of Justice’s Research Assistantship Program and a doctoral candidate in the School of Criminal Justice at Rutgers University, Newark. Her research interests include the collateral consequences of imprisonment, support during and after prison release, and gender differences in family dynamics affected by incarceration.

Rachael M. Rief, PhD is a recent participant in the National Institute of Justice’s Research Assistantship Program and currently serving as a post-doctoral researcher. Her research interests include police practice, women in policing, police culture, and police organizations.

References

Barnes-Lee

A. R.

(2020). Development of protective factors for reducing juvenile reoffending: A strengths-based approach to risk assessment. Criminal Justice and Behavior, 47(11), 1371–1389. https://doi.org/10.1177/0093854820949601

Berk

(2009). The role of race in forecasts of violent crime. Race and Social Problems, 1, 231–242. https://doi.org/10.1007/s12552-009-9017-z

Berk

(2017). An impact assessment of machine learning risk forecasts on parole board decisions and recidivism. Journal of Experimental Criminology, 13(2), 193–216. https://doi.org./10.1007/s11292-017-9286-2

Berk

Heidari

Jabbari

Kearns

Roth

(2021). Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research, 50(1), 3–44. https://doi.org/10.1177/0049124118782533

Berk

Hyatt

(2015). Machine learning forecasts of risk to inform sentencing decisions. Federal Sentencing Reporter, 27(4), 222–228. https://doi.org/10.1525/fsr.2015.27.4.222

Brayne

Christin

(2021). Technologies of crime prediction: The reception of algorithms in policing and criminal courts. Social Problems, 68(3), 608–624.

Burrell

(2016). How the machine “thinks”: Understanding opacity in machine learning algorithms. Big Data & Society, 3(1), 205395171562251. https://doi.org/10.1177/2053951715622512

Desmarais

S. L.

(2017). Commentary: Risk assessment in the age of evidence-based practice and policy. International Journal of Forensic Mental Health, 16(1), 18–22. https://doi.org/10.1080/14999013.2016.1266422

Desmarais

S. L.

Johnson

K. L.

Singh

J. P.

(2016). Performance of recidivism risk assessment instruments in US correctional settings. Psychological Services, 13(3), 206–222. https://doi.org/10.1037/ser0000075

10.

Duwe

(2021). Evaluating bias, shrinkage and the home-field advantage: Results from a revalidation of the MnSTARR 2.0. Corrections, 1–23. https://doi.org/10.1080/23774657.2021.2011802

11.

Duwe

Clark

(2013). The effects of private prison confinement on offender recidivism: Evidence from Minnesota. Criminal Justice Review, 38(3), 375–394.

12.

Fazel

Singh

J. P.

Doll

Grann

(2012). Use of risk assessment instruments to predict violence and antisocial behaviour in 73 samples involving 24 827 people: Systematic review and meta-analysis. British Medical Journal, 345, Article e4692. https://doi.org/10.1136/bmj.e4692

13.

Garb

H. N.

Wood

J. M.

(2019). Methodological advances in statistical prediction. Psychological Assessment, 31(12), 1456–1466. https://doi.org/10.1037/pas0000673

14.

Garrett

B. L.

Monahan

(2019). Assessing risk: The use of risk assessment in sentencing. Judicature, 103(2), 42–48. https://judicature.duke.edu/articles/assessing-risk-the-use-of-risk-assessment-in-sentencing/

15.

Garrett

B. L.

Monahan

(2020). Judging risk. California Law Review, 108, 439–493. https://doi.org/10.15779/Z38B56D515

16.

Ghasemi

Anvari

Atapour

Stephen Wormith

Stockdale

K. C.

Spiteri

R. J.

(2021). The application of machine learning to a general risk–need assessment instrument in the prediction of criminal recidivism. Criminal Justice and Behavior, 48(4), 518–538. https://doi.org/10.1177/0093854820969753

17.

Goodley

Pearson

Morris

(2021). Predictors of recidivism following release from custody: A meta-analysis. Psychology, Crime & Law, 1–27. https://doi.org/10.1080/1068316X.2021.1962866

18.

Green

(2020). The false promise of risk assessments: Epistemic reform and the limits of fairness. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 594–606). ACM. https://doi.org/10.1145/3351095.3372869

19.

Hamilton

(2015). Back to the future: The influence of criminal history on risk assessments. Berkeley Journal of Criminal Law, 20(1), 75–133. https://ssrn.com/abstract=2555878

20.

Hamilton

Kowalski

M. A.

Kigerl

Routh

(2019). Optimizing youth risk assessment performance: Development of the Modified Positive Achievement Change Tool in Washington State. Criminal Justice and Behavior, 46(8), 1106–1127.

21.

Hermstrüwer

(2019). Artificial intelligence and administrative decisions under uncertainty. In Wischmeyer

Rademacher

(Eds.), Regulating artificial intelligence (pp. 199–223). Springer.

22.

Freeman

K. R.

Jannetta

Kim

(2021). Communicating risk information for effective decisionmaking. Urban Institute. https://www.urban.org/sites/default/files/publication/103865/communicating-risk-information-for-effective-decisionmaking.pdf

23.

Hudgins

White

Applegarth

D. M.

Hunt

(2022, March 21). Results from the National Institute of Justice recidivism forecasting challenge. https://nij.ojp.gov/topics/articles/results-national-institute-justice-recidivism-forecasting-challenge

24.

Kigerl

Hamilton

Kowalski

Mei

(2022). The great methods bake-off: Comparing performance of machine learning algorithms. Journal of Criminal Justice, 82, 101946. https://doi.org/10.1016/j.jcrimjus.2022.101946

25.

Klingele

(2020). Making sense of risk. Behavioral Sciences & the Law, 38(3), 218–225. https://doi.org/10.1002/bsl.2458

26.

Krauss

D. A.

Cook

G. I.

Klapatch

(2018). Risk assessment communication difficulties: An empirical examination of the effects of categorical versus probabilistic risk communication in sexually violent predator decisions. Behavioral Sciences & the Law, 36(5), 532–553. https://doi.org/10.1002/bsl.2379

27.

Latessa

E. J.

Lovins

(2010). The role of offender risk assessment: A policy maker guide. Victims and Offenders, 5(3), 203–219. https://doi.org/10.1080/15564886.2010.485900

28.

Levin

S. K.

Nilsen

Bendtsen

Bulow

(2016). Structured risk assessment instruments: A systematic review of implementation determinants. Psychiatry, Psychology and Law, 23(4), 602–628. https://doi.org/10.1080/13218719.2015.1084661

29.

McGreevy

(2020, October 7). California voters to decide whether to end cash bail system with Proposition 25. Los Angeles Times. https://www.latimes.com/california/story/2020-09-14/california-voters-referendum-end-cash-bail-system-proposition-25

30.

Miller

W. T.

Campbell

C. A.

Papp

Ruhland

(2022). The contribution of static and dynamic factors to recidivism prediction for Black and White youth offenders. International Journal of Offender Therapy and Comparative Criminology, 66(16), 1779–1795.

31.

National Institute of Justice. (n.d.). Recidivism forecasting challenge. https://nij.ojp.gov/funding/recidivism-forecasting-challenge

32.

National Institute of Justice. (2021, July 28). Recidivism forecasting challenge: Official results. https://nij.ojp.gov/funding/recidivism-forecasting-challenge

33.

Nelson

R. J.

Vincent

G. M.

(2018). Matching services to criminogenic needs following comprehensive risk assessment implementation in juvenile probation. Criminal Justice and Behavior, 45(8), 1136–1153. https://doi.org/10.1177/0093854818780923

34.

Osoba

O. A.

Boudreaux

Saunders

J. M.

Irwin

J. L.

Mueller

P. A.

Cherney

(2019). Algorithmic equity: A framework for social applications. RAND. https://www.rand.org/pubs/research_reports/RR2708.html

35.

Popp

(2017, August 28). Black box justice. The Pennsylvania Gazette. https://thepenngazette.com/black-box-justice

36.

Rief

R. M.

Lewis

R. A.

Applegarth

D. M.

(2023). In pursuit of fairness: A review of themes of gender responsiveness and racial bias in actuarial risk assessment tools. Manuscript in preparation.

37.

Ritter

(2013). Predicting recidivism risk: New tool in Philadelphia shows great promise. National Institute of Justice Journal, 271(February), 4–13. https://www.ojp.gov/pdffiles1/nij/240696.pdf

38.

S. bill 10. (2018). Pretrial release or detention: Pretrial services. https://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=201720180SB10

39.

Selbst

A. D.

Barocas

(2018). The intuitive appeal of explainable machines. Fordham Law Review, 87, 1085–1140. https://doi.org/10.2139/ssrn.3126971

40.

Starr

S. B.

(2015). The risk assessment era: An overdue debate. Federal Sentencing Reporter, 27(4), 205–206. https://doi.org/10.1525/fsr.2015.27.4.205

41.

Tonry

(2019). Predictions of dangerousness in sentencing: Déjà vu all over again. Crime and Justice, 48(1), 439–482. https://doi.org/10.1086/701895

42.

Viljoen

J. L.

Cochrane

D. M.

Jonnson

M. R.

(2018). Do risk assessment tools help manage and reduce risk of violence and reoffending? A systematic review. Law and Human Behavior, 42(3), 181–214. https://doi.org/10.1037/lhb0000280

43.

Werth

(2019). Risk and punishment: The recent history and uncertain future of actuarial, algorithmic, and “evidence-based” penal techniques. Sociology Compass, 13(2), Article e12659. https://doi.org/10.1111/soc4.12659

44.

White

Applegarth

D. M.

Hunt

Hudgins

(2022, February). The NIJ recidivism forecasting challenge: Contextualizing the results (NCJ 304110). U.S. Department of Justice, Office of Justice Programs, National Institute of Justice. https://nij.ojp.gov/library/publications/nij-recidivism-forecasting-challenge-contextualizing-results

45.

White

Rief

Hudgins

C. D.

Pimsler

M. L.

Hunt

(2023). Meta-analysis of NIJ forecasting challenge winning reports. Report in Preparation.

46.

Wormith

J. S.

(2017). Automated offender risk assessment: The next generation or a black hole? Criminology & Public Policy, 16(1), 281–303. https://doi.org/10.1111/1745-9133.12277