Abstract
Should court judgments be publicly available for text and data mining purposes? This article shows that the arguments for and against access to judgments conflate different understandings of what judgments are. On one view, judgments are seen as a ‘jurisprudential’ category, whereas the other view regards them as something ‘factual’. Once it is understood that these views and the claims based on them do not fight over the same territory, it should be easier to make judgments more widely available, including for the purposes of computational analysis of judgments as bulk data. The purpose of this article is to help to clear the ground for the debate around access to judgments as bulk data and highlight some relevant considerations for the preferred licencing regime concerning judgments.
Keywords
In 2022, the UK government's National Archives launched a new centralised database for court judgments in England and Wales (see Cross, 2022; The National Archives, 2022a). Such an archive, should it become comprehensive and should the judgments become easily accessible in bulk, presents an irresistible attraction for tech entrepreneurs. The database would allow them to mine the data in order to, for example, provide better legal services or inform justice sector policies. At the same time, the database promises to increase public access to legal information contained in those judgments.
It is imperative in this context that the emerging debate around the computational analysis of judgments as bulk data is framed adequately from the start. The purpose of this article is to help to clear the ground for this debate and highlight some relevant considerations for the preferred licencing regime concerning judgments. Along the way, I will also summarise recent developments in this area.
My main point is that the debate about whether judgments should be made publicly available suffers from a conflation of different understandings of what judgments are. On one view, judgments are seen as a ‘jurisprudential’ category, whereas the other view regards them as something ‘factual’. Concerns relating to ‘factual’ judgments are then mistakenly seen as a reason to limit both public access to judgments and data-driven analysis of judgments as bulk data. To illustrate my point, I discuss the current developments in the United Kingdom, but the main claim is conceptual and is thus relevant for the future of online courts and data-driven legal services globally.
A little bit of background
The digitisation of English courts is well under way (HM Courts & Tribunals Service, 2018; HC Justice Committee, 2022) and with it, there has been an increased pressure to collect and publish relevant court records suitable for bulk analysis (Rozenberg, 2020; Byrom, 2019). In many ways, the potential of AI-driven analysis of court data is evident, and the ‘Data First’ initiative – through which criminal court datasets are made available to accredited researchers for all kinds of computational analyses – is a testament to the Ministry of Justice's support for bulk data analysis (Ministry of Justice, 2020). Imagine, for example, that judges could write their judgments using GANs (generative adversarial networks) trained on all the available judgments. Or imagine that you could ‘predict’ the outcome of your dispute (Medvedeva et al., 2023). A statistical analysis of judgments could also help us better to understand the key trends in judicial decision-making, identify court biases, or spot outlier practices. This all could bring down legal costs and ultimately increase access to justice (Janeček, 2021).
The Achilles’ heel of any data-driven vision of the English justice system is, however, the unavailability of data. As observed by Shubber, ‘black holes of information are common in English law’ (2022). I do not want to discuss why such black holes exist (see e.g. Hoadley, 2022). With Hoadley and others, it suffices to highlight that ‘an important part of … law is not meaningfully public’ (Hoadley et al., 2021; Hoadley, 2018).
You may think that the English judiciary or the Ministry of Justice holds somewhere an ultimate record of all judgments, but that is unfortunately not the case. Despite court judgments being official public records (s 8 of the Public Records Act 1958), there is no comprehensive public database of digitised court judgments and the new database by The National Archives does not change that (Magrath and Beresford, 2023).
Commercialisation of judgments
One difficulty behind the limited availability of court judgments is that they do not emerge from any standardised production pipeline with a transparent audit trail. Anecdotal evidence (gathered through this author's informal interviews) suggests that it is often down to pure happenstance whether a judgment ends up in an archive or not. Each court, and often also each judge, has its own way to produce and archive the ultimate version of the record of their judgment.
Such lack of standardisation creates a gap that is, so far, filled by commercial entities whose mission is not free access to judgments, let alone free access to judgments as bulk data. These private entities – mainly legal publishers – have secured themselves privileged access to judgments, be it through their relations with transcription agencies, with court administrators, or through early access to scattered copies of judgments at various locations across the country. And this advantageous position in the production pipeline now allows them commercially to exploit access to judgments so obtained.
Research suggests that commercial publishers in fact already control access to a vast majority of recorded judgments (Hoadley, 2022). As put by Walker (2022), it is ‘an uncomfortable truth that the Ministry of Justice has quietly let one of their biggest assets [i.e., the court judgments] be monetised by international companies … [although] it appears that the public would not be comfortable at all with this status quo’. Indeed, members of the public ‘were alarmed about the lack of transparency of the data sets held by [private publishers] and about the possibility that they have more court judgments than are publicly available’ (Gisborne et al., 2022: 25).
Interestingly, this situation remains largely unchanged despite the fact that the private publishers are no longer in any legitimate position to determine access and usage rights when it comes to judgments. In the pre-digital era, the publishers’ control might have been justified on the basis that they helped produce and disseminate law ‘reports’ which were often the only record of a court judgment. It was thus, on balance, better to have this privately controlled data – produced by lawyers for lawyers – than to have no record of the existing law at all (see Bryan, 2009, 2012). But nowadays, when the law can be reported directly by way of facilitating access to the electronic copies of judgments produced by publicly funded courts, it is difficult to justify the publishers’ claim to control access to judgments.
Access to judgments as bulk data
There is a powerful argument that all judgments should be publicly available because they are records of the law, that is, records of the public (as opposed to private) legal information, and because individuals are expected to comply with the law (Mitee, 2017, 2019). From a constitutional and human rights viewpoint, access to judgments is thus framed around individuals’ right of access and around the public interest in access to law. Indeed, according to the Public Records Act 1958, judgments should be ‘available to the public for inspecting and obtaining copies’ (s 5(3)). Likewise, according to The National Archives’ Open Justice Licence, everyone is ‘free to copy, publish, distribute[,] transmit [and even to] … exploit [the judgments] commercially’ (The National Archives, 2022b).
By contrast, ‘computational analysis’ of judgments is not permitted under the Open Justice Licence (The National Archives, 2022b). Similarly, the British public would prefer judgments to be bulk-analysed if and only if the analysis would to some extent be in the public interest (Gisborne et al., 2022: 37, 45). A new report ‘Justice Data Matters: Building a public mandate for court data use’ (Gisborne et al., 2022) – which is the first serious attempt to ascertain what members of the public think about court records and their use – points to a yawning gap between the reality of court data control and processing on the one hand and the public's awareness and attitudes concerning the use of such data on the other hand. For example, 50% of the respondents (from a representative sample of 2164 adults in Great Britain, aged between 16 and 75) felt uncomfortable about technology companies being able to access and use information from court records (Gisborne et al., 2022: 53f). And almost two-thirds felt that the government keeps the public poorly informed about the current uses of information from court records (Gisborne et al., 2022: 53).
Does this mean that, given the surge of technology companies interested in the data, public access to judgments should be abolished or at least strictly limited? On the one hand, members of the public are rightfully entitled to free access to judgments, and yet the same members do not want judgments to be freely accessible. The ‘Justice Data Matters’ report reveals ‘mixed levels of comfort about court judgments being publicly available’ (Gisborne et al., 2022: 24). Based on the report, the public seems concerned about the sensitivity of information contained in the judgments, such as the parties’ names, their religious beliefs, information about their activities, about their reputation, and so on. In this regard, the public's concerns play into the hands of commercial publishers. It is clearly not in the publishers’ economic interest to have more judgments being available since that would set back their competitive advantage and open the market with legal research services to new entrants.
A misleading framing and the lawyers’ mindset
If the debate about public access to judgments and about their availability as bulk data is to make any significant progress, it is desirable that we properly understand the difference between the availability of judgments in the ‘jurisprudential’ and ‘factual’ sense. On the one hand, judgments are sources of legal information and they are relied on as precedents for future cases. In this ‘jurisprudential’ sense, judgments have an effect in both the real world (as they adjudicate disputes) and the normative world (as they establish precedents and affect existing case law).
On the other hand, judgments are data that can be bulk analysed and mined for non-jurisprudential insights such as court biases or, for example, used as training data for machine learning. In this ‘factual’ sense, judgments are a source open to data-driven analysis. The source contains some observable data points which can be read and analysed by machines.
Importantly, access to judgments as a ‘factual’ category is not the same as access to judgments as a ‘jurisprudential’ category. It is the same data, but looked at from a different viewpoint. And yet there is a widespread belief that judgments are always to be seen as a jurisprudential record of the law or, to put this in lawyers’ terminology, that judgments are a special type of record that is to be studied and analysed by way of doctrinal legal analysis only. This is reinforced by the typical lawyers’ mindset and, not surprisingly, lawyers still dominate the debates concerning access to judgments.
Lawyers are trained to analyse judgments in a highly specific manner. This mindset allows them to deal with legal issues in a standardised way and, as a result, to participate in technical legal debates and procedures. At the same time, however, the lawyer's mindset is not very helpful when it comes to data-oriented thinking (Janeček et al., 2021: 5–7). Paradoxically, we lawyers (yes, I am one of them) tend to say that our reasoning is evidence-based, that we use court judgments as our data points and as if our legal research and legal reasoning were data-driven. Yet the typical legal thinking is worlds-apart from the statistical data-driven reasoning and lawyers easily confuse the former with the latter.
Many lawyers cannot let go of the idea that judgments – either individually or in bulk – are always something jurisprudential, not factual. To lawyers, judgments form an intrinsic and indispensable part of the existing jurisprudence (‘jurisprudence’, in this sense, denotes a set of legal doctrines derived from a collection of judgments) which is why they are typically unable, or perhaps unwilling, to view data-driven analysis of court decisions as an exercise in ‘judgment’ analysis in any meaningful sense.
Doctrinal legal analysis takes judgments to be the source of jurisprudential (not factual) information, which is why such analysis – and lawyers have the doctrinal techniques baked in their mindset – cannot be performed by way of text and data mining techniques. If law is a set of doctrines derived from judicial precedents, so the argument goes, then statistical analysis of judgments does not and cannot identify such doctrines. Legal doctrines are never recorded in judgments at a systems level because a legal doctrine is not an observable quality of a dataset. In other words, the jurisprudential view holds that legal doctrines (from an internal, jurisprudential point of view (Shapiro, 2006)) are not and cannot be distributed across the dataset, because doctrines simply do not have any such statistical properties.
Lawyers thus often tend to disregard arguments about access to judgments as bulk data, because they think these are not arguments about access to judgments in the first place. When I talk to law professionals about archiving of judgments and how these records might be reused for text and data mining purposes, it thus often feels like hitting a wall. It is not uncommon for lawyers to ignore that judgment records are sources of data at various levels of abstraction (Floridi, 2008) and that doctrinal analysis of judgments engages with the records at just one of those levels. Accordingly, debates around access to judgments for text and data mining purposes are often at cross-purposes. The traditional legal stakeholders talk about access to judgments in a jurisprudential sense and while others talk about access to judgment in a factual sense.
The way forward
Do not get me wrong. I am not suggesting that we abandon the jurisprudential view. It has its useful role and underlies the whole practice of the law. My point is that we should strive to avoid confusing judgments as textual records of factual information with judgments as sources of legal doctrines. The supposed public interest in free access to law, or more precisely to public legal information (Mitee, 2019: 34f) does not apply (at least not directly) to judgments in a factual sense. And vice versa, the fact that sensitive information can be mined from the text of judgments does not imply that these judgments should not be available as a jurisprudential category.
We saw that the public are concerned about the availability of judgments for bulk analysis for reasons that are not attributable to such analysis. It is true that all sorts of factual information contained in the judgments are best suited for statistical analysis, but the statistical analysis does not itself produce that concerning information. So just like jurisprudential analysis of judgments is unconcerned with sensitive information relating to specific individuals, statistical analysis is also unconcerned with such individual-level insights. The bulk data analysis can reveal insights about the whole dataset and produce information going above and beyond individual judgments, but there is no a priori reason to be concerned about those insights. In fact, such insights might be in the public benefit (especially if they are publicly shared).
Relatedly, the risks stemming from nefarious computational analyses of judgments (see Adams et al., 2022), should not be confused with the risks of making the data available or with the risks of analysing them as ‘factual’ judgments. To use an analogy, if one can use a pencil to physically injure others, it does not automatically follow that pencils should not be publicly available. We may want to regulate nefarious uses of the insights produced by computational analysis of judgments as bulk data, but that does not mean that we should ban the analysis itself, let alone limit public access to judgments.
Of course, the system of justice and the court judgments are a publicly funded resources. The public thus have an undisputable mandate to demand how these data be accessed and used. The public can demand, for example, that judgments be available as bulk data only for a public benefit, as indeed does the DSM Directive when it comes to text and data mining (Directive (EU) 2019/790, Arts 3 and 4; see also Recital 6 of the DSM Directive and Recital 8 and Art 1(2)(c) of the Directive (EU) 2019/1024 on open data and the re-use of public sector information). But the question we should ask in this regard is whether the public mandate concerns both ‘factual’ and ‘jurisprudential’ judgments. If not, we may need to identify new reasons why it might be in the public interest to limit (or not to limit) access to all records of judgments as bulk data (see Aidinlis et al., 2020; Adams et al., 2022; Hoadley et al., 2021). It is commendable to consult those new reasons with the public, as was the case in the ‘Justice Data Matters’ report, but it is also commendable to make sure that the public understand the key distinctions set out in this article. Otherwise, there is a risk that access to judgments will become much more restricted than it needs to be.
So what does this mean for the licencing of judgments as bulk data? You may think that if we can somehow distinguish the ‘jurisprudential’ judgments from the ‘factual’ judgments, then we need a special licencing regime for judgments as bulk data. But if ‘jurisprudential’ and ‘factual’ judgments are simply two ways of looking at the same data, then all this data should be subjected to the same licencing regime (but cf. The National Archives, 2022b; BAILII, 2022). The argument advanced in this paper is intended to help us to see that there are no necessary trade-offs between making the data available as a ‘jurisprudential’ category versus making them available as a ‘factual’ category.
Besides, as I tried to show, it is difficult to conceive that the computational analysis would engage with (let alone reproduce) judgments in the jurisprudential sense. It is thus equally difficult to justify control over access to judgments as bulk data by reference to copyright, regardless of whether the access and usage rights concerning ‘jurisprudential’ judgment are determined by judges (McCloud, 2021: [11]), parties’ counsels and solicitors (Vos et al., 2022), the private publishers (see Leith and Fellows, 2009), The National Archives (2022c), the BAILII (2022), the UK Government (2022) or by those who first get hold of a judgment transcription (HM Courts & Tribunals Service, 2022).
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
