Abstract
Kat Albrecht and Kaitlyn Filip on data disadvantaging defendants.
In courtrooms, research labs, and policy groups across the United States, “data law” is on the rise. Not only do we use more and more empirical data to analyze courts, courtrooms themselves use more and more data as part of active cases. But with the increased expectation that cases will use legal data, the burden of producing such high-quality data has shifted to defendants and claimants, rather than the state or the courts which have far more substantial access. For instance, in Racial Justice Act hearings in California, making a preliminary showing requires the defense to obtain data from the state (which often redacts significant information or shares it in an unanalyzable form), analyze the data to demonstrate inequality in criminal sentencing, and answer for the quality of that data in court. This state of affairs, as we will show, limits the utility of data in the courtroom and exacerbates existing biases and disadvantages in legal proceedings.
Courts are best positioned to collect, store, and publish court data. However, the burden of data production and analysis has been shifted onto defendants and claimants.
The Rise of Data Law
The rise of data-informed law, or “data law,” spans multiple domains such that data is entering legal research and active legal cases in multiple new ways. In response, and in part capitalizing on the alleviation of certain computational constraints, the field of computational legal analytics has expanded dramatically. Innovations in this space include the development of new computational methods and datasets both inside and outside the traditional academy.
Non-profits and research groups have also undertaken substantial efforts to make data more available in different domains in order to accommodate the needs of defendants and their legal teams. The non-profit Measures for Justice has made various criminal legal data available at the county level through a public-facing data portal. The Criminal Justice Administrative Records System (CJARS) built a massive national data infrastructure for collecting and harmonizing criminal data at multiple process points. And Systematic Content Analysis of Litigation Events (SCALES) developed an A.I.-powered data platform that makes details and insights about the federal courts open-source and open access, helping to fill a susbtantial data gap, particularly in the civil court space.
In tandem with new methods and increased interest in building court data infrastructures, individual courts have even begun transparency projects around their own records. Courts in states such as Florida, Ohio, and Illinois have launched or announced public-facing portals that will allow the general public to download and access criminal data. Other states, like Pennsylvania, have developed research pipelines that allow for the systematization of data requests and the development of Memorandums of Understanding for data use at a much larger scale. The confluence of these factors has increased the potential use of court data for traditional research, in policy arguments, and in active legal cases.
Public Access to Court Data
There is a clear Constitutional imperative to court transparency. The First and Sixth Amendments guarantee the right to a public trial and the Supreme Court has consistently affirmed that right, including in Richmond Newspapers v. Virginia. Yet court data remains less than meaningfully accessible, and the data collected is not necessarily meaningful data. Even technically public data portals can create additional barriers to public access. For example, the “Public Access to Court Electronic Records” (PACER) should theoretically make a project like SCALES unnecessary. However, the PACER interface is outdated, opaque, and, as discovered by the SCALES team, prohibitively expensive. Procuring partial data records for a single year at the federal level alone cost over $100,000.
As to meaningless data, the problem is particularly pernicious in civil courts, where even the most basic descriptive information about courts is often unknowable to researchers, policy makers, and court actors. That’s because it is simply not collected. This makes civil policy work particularly difficult in terms of making strong arguments in favor of (or against) various policies or reforms. For example, courts may implement programs designed to improve access for self-represented litigants but may be unable to measure their impact or use due to a lack of available ongoing data on even the basic issue of representation.
Jernej Furman, Flickr CC
For instance, the Cook County Domestic Relations Division in Illinois has been using administrative Hearing Officers to facilitate services for self-represented litigants for six years. Yet it is hard to know how impactful the program has been because there’s little available data from civil courts to compare it to.
Using Data in Court
Courts are best positioned to collect, store, and publish court data. However, the burden of data production and analysis has been shifted onto defendants and claimants, further exacerbating inequalities in the courtroom. Let’s look again to California’s Racial Justice Act (RJA). It is meant to guard against racial disparity, but in practice presents a host of data challenges that courts have yet to surmount. Signed into law on September 29, 2022, the RJA aims to alleviate the burden, in part reified through McClesky v. Kemp, on defendants to prove intentional discrimination by allowing for arguments based in racially-coded language, racial disparity in charges and convictions, and racial disparities in sentencing. As the first wave of cases enters the courts, however, there is little sense of what sorts of data will be considered adequate proof.
We do know that to be entitled to a hearing under the RJA, a defendant must make a prima facie showing that racial discrimination affected their case. Technically, this requires only proof that there is a “substantial likelihood” that discrimination occurred, but the expectation is that defendants will produce high-quality, particularly statistical data to undergird that claim. Because the RJA lacks teeth in forcing high quality data production from the state, it becomes the defense’s responsibility to obtain and analyze data from the prosecution or the Department of Corrections and Rehabilitation in order to allege the bias of those bodies. Compelling this discovery has proven to be difficult and expensive for defense teams that are often working on limited funds provided by aid organizations.
Incredibly, the defense is required to make a data argument even when the requisite data has been poorly curated or not collected in the first place. In other words, the state demands proof that the state itself must produce and provide.
Recommendations
Courts need to reimagine the way data is collected, stored, and shared with the public and with court actors. Importantly, courts need to reckon with the fact that data expectations disadvantage already under-resourced defendants and claimants. Specifically, when critiquing the ability of defendants and claimants to make prima facie arguments with data, judges need to hold the state and state institutions to higher data (and data-sharing) standards.
This implies two related changes: First, courts must routinely grant significant data discovery. Second, courts must place the burden of data quality on those institutions that maintain it rather than shifting the burden to individual defendants and claimants. Ultimately, that means funding data collection, data storage, and data production while facilitating the right of the public to inspect— and use—court records in and beyond individual cases.
