Sage Journals: Discover world-class research

Abstract

This article presents a systematic and critical assessment of the reliability of forensic science in New Zealand. It documents the types of forensic-science being offered in criminal cases, the party presenting the evidence, the experts’ affiliations, how often there are challenges to the admissibility of the expert evidence and their timing in the proceedings, how often experts rely upon the uniqueness assumption, and how often experts testify to an individualised identification or ‘match’ of a source of forensic evidence. It finds that several of the common forensic disciplines in the criminal justice system in New Zealand have been the subject of critique and criticism internationally, the most common source of expert evidence was presented by the prosecution and provided by institutional police laboratories, and in most cases the forensic expert testified either to the uniqueness assumption or to an individualised match determination. It concludes that the New Zealand Parliament should amend the Evidence Act 2006 to require a demonstration of foundational validity and as-applied reliability as a precondition to the admissibility of any purported scientific evidence.

Keywords

forensic science evidence law validity and reliability

Introduction

Forensic scientists are generally called as expert witnesses to offer opinions about whether two items of evidence (one associated with the crime and one associated with a suspect) appear to be similar or indistinguishable and whether the observed similarity supports the conclusion that both items of evidence derived from the same source. Many of these traditional pattern-matching ‘sciences’ were developed by police, lack controlling standards, and have never been scientifically validated.¹ Most involve pattern matching: visual comparison of crime-scene evidence and a suspect's exemplar to determine whether the two ‘match’. Fingerprint, handwriting, firearm, shoe-print and tyre comparison are pattern-matching disciplines, none of which have been subject to serious scientific vetting.² They lack known error rates or baseline data of the frequency of the features compared.

The Institute of Environmental Science and Research (ESR) regularly engage in blood-pattern analysis, microscopic glass, paint and fibre comparison, and firearm comparison, techniques that have been criticised in other countries. The Forensics Unit of the New Zealand Police make ‘identifications’ of latent fingerprints³ and questioned documents and testify to ‘matches’.

Lawyers and judges often have insufficient training in scientific methodology and fail to understand the reliability concerns with this evidence. Compounding this lack of competency, the New Zealand Supreme Court has adopted the American Daubert⁴ test for the admissibility of scientific evidence under Section 25 of the Evidence Act 2006, even though the test is largely regarded by American scholars as an inadequate safeguard against unreliable forensic evidence because of courts failure to demand and evaluate foundational research before admitting potentially untrustworthy evidence.⁵

While there have been high-profile inquiries in other countries,⁶ there has never been a systematic assessment of the reliability of forensic science in Aotearoa New Zealand. Aotearoa New Zealand has the opportunity to examine critically the reliability of its scientific evidence proactively, before it has the database of wrongful convictions caused by junk science that so many other countries have.

The admissibility of expert opinion evidence

In 1993, in Daubert v Merrell Dow Pharmaceuticals, Inc,⁷ the United States Supreme Court ruled that the Federal Rules of Evidence required trial judges to ‘ensure that any and all scientific testimony or evidence admitted is not only relevant, but reliable’.⁸ The Court explained that, when the subject matter of an expert's testimony purported to be based on ‘scientific knowledge’, evidentiary reliability had to be ‘based upon scientific validity’.⁹ The Court enumerated a non-exhaustive list of factors that trial courts should consider in determining whether expert evidence met its new evidentiary standard: (1) whether the theory or technique could be (and had been) tested; (2) whether the theory or technique had been subjected to peer review and publication; (3) the known or potential rate of error of the technique; (4) the existence and maintenance of standards controlling the technique's operation; and (5) the technique's degree of acceptance within the relevant scientific community.¹⁰

Adversarial failures

The NRC Report concluded that trial safeguards had been ‘utterly ineffective’ in ensuring the reliability of forensic science evidence.¹¹ Edmond et al (2014) note how cross-examination and defence experts are poor vehicles for challenging methodological problems with forensic science.¹² They note how legal actors are particularly poor at identifying and explaining the threat posed by exposing analysts to domain-irrelevant information to juries, increasing the risk of wrongful conviction.¹³

Judicial gatekeeping

Shortly after Daubert was decided, New Zealand courts began to apply its approach for screening scientific evidence.¹⁴ In 1999, the New Zealand Law Commission blessed this development, noting that the Daubert factors ‘continue to be important in the inquiry about reliability’.¹⁵

Today, the admissibility of expert evidence is governed primarily by Section 25 of the Evidence Act 2006, which requires that expert evidence be ‘substantially helpful’ in order to be admitted.¹⁶ The Court of Appeal has admonished that the Section 25 substantial-helpfulness test requires a consideration of an ‘amalgam’ of logical relevance, reliability and probative value.¹⁷ As the Court recently explained in Lundy v R:

Under s 25 (1) of the Evidence Act, an expert opinion is admissible if the fact-finder is likely to obtain substantial help from the opinion in understanding other evidence of ascertaining any fact that is of consequence to the determination of the proceeding. . . . [I]t is axiomatic that if the fact-finder is to be helped to ascertain facts, expert opinion evidence must meet a threshold of reliability. Otherwise the evidence will hinder, and potentially mislead rather than help. . . . [I]n the scientific field whether a methodology is satisfactory . . . must depend ultimately on the response that is given to it by the relevant scientific community. The robustness of a methodology cannot legitimately be established by an inexpert judge or jury. The essential work of validation must occur before the courtroom is entered.¹⁸

Despite this clear framework for assessing the reliability of scientific evidence as a precondition to its admissibility, the New Zealand courts have tended to be lax in their approach to gatekeeping. In Shepherd v R,¹⁹ Shepherd was charged with the early-morning armed robbery of a bar.²⁰ The bar had a CCTV surveillance system, but the video footage taken during the robbery was of insufficient quality to identify the robber, because it had poor resolution and the robber was wearing a cap low on his head during the robbery.²¹ The sole issue at trial was whether the Crown could prove beyond a reasonable doubt that Shepherd was the robber.²² A crucial component of the Crown's evidence on the issue of the identity of the robber was the expert testimony of a forensic analyst who opined that there was ‘strong support’ for the identification of Shepherd as the robber on the basis of a discipline known as forensic facial mapping developed by the Forensic Imagery Analysis Group of the British Association for Human ID.²³ The expert's purported forensic facial mapping technique involved a subjective comparison of the similarities between the facial features of the robber captured in compressed and digitally enhanced still frames from the CCTV footage at the bar during the robbery and the facial features of Shepherd from his booking photograph.²⁴

The jury found Shepherd guilty of aggravated robbery.²⁵ Shepherd's principal ground for appeal was that the admission of the facial mapping evidence had resulted in a miscarriage of justice.²⁶ Nonetheless, his appellate counsel ‘made it clear he was not challenging the acceptance of facial mapping or the use of such evidence for identification purposes’ because he ‘recognised that the courts in New Zealand, Australia and the United Kingdom have accepted that such evidence can be admissible in criminal proceedings, even though some limitations or qualifications on the use of such evidence have been expressed’.²⁷ Instead, counsel focused primarily on the qualifications of the Crown's expert and the helpfulness of the testimony to the jury.²⁸

Regarding the defence expert's critique that the Crown's witness compared the two images in the absence of a sufficient minimum number of pixels to make the comparison reliable, the Court noted that the Crown's expert ‘did not accept . . . that this made his analysis unreliable’.²⁹ Regarding the absence of any statistical basis with which to interpret the significance of a purported ‘match’, the Court opined:

Despite the reservations expressed by critics . . ., we see no reason in principle why facial mapping evidence should not be admitted in appropriate cases. We do not view the absence of a database, which would provide a means of quantifying the statistical likelihood of a match between, for example, an image of an offender captured on CCTV and a photograph of the accused as being a disqualifying factor.³⁰

Ultimately, the Court concluded that the Crown's expert's testimony was admissible because the expert ‘was clear that the images analysed were sufficiently reliable to enable his analysis to be undertaken’, such that ‘[t]o conclude that his evidence was unreliable would be to accept [the defence expert's challenge to the forensic-science] evidence in preference to that of [the witness whose expert testimony was being challenged]’.³¹

In summary, the Court concluded that the expert's ‘knowledge and expertise in this field was sufficient to qualify him as an expert and the evidence he gave was within his area of expertise’,³² but failed to assess whether the expert was qualified to render the specific expert opinion that he gave – namely, that facial mapping reliably indicated a high likelihood that Shepherd was the person in the CCTV video. It suggested that ‘guidelines or protocols’ governing facial mapping were important ‘to ensure the authenticity, integrity and reliability of facial mapping’, but did not treat the current absence of guidelines as a factor inuring against admission of the testimony in the meantime.³³

Unfortunately, there is now a growing list of individuals in other countries known to have been wrongly ‘matched’ to suspects through the specific technique at issue in Shepherd: facial mapping using facial-recognition software. For example, in the United States, Steve Talley was arrested not once but twice when a forensic examiner in the FBI's Forensic, Audio, Video and Image Analysis Unit determined that his photograph ‘matched’ CCTV frames from two separate bank robberies to a ‘high degree of certainty’.³⁴ Talley was lucky enough to be exonerated of the first robbery, after two months on remand, when he was able to establish a solid alibi and when a fingerprint comparison established that his prints did not match the latent prints left behind at the bank by the robber.³⁵ Nonetheless, the Government proceeded with prosecuting him for the second robbery, only finally dropping the charges after its star witness, a bank teller, recanted her identification of Talley and testified adamantly that he could not have been the robber.³⁶

In Scotland, William Mills was similarly convicted of bank robbery after two police officers ‘identified’ him from CCTV stills of the robber taken from bank surveillance video.³⁷ After six months of imprisonment, he was exonerated on appeal when he was able to introduce fresh DNA evidence from the bank that implicated another suspect with a prior history of bank robbery.³⁸

This project seeks to evaluate the state of forensic-science evidence in the New Zealand criminal courts. It evaluates whether the Evidence Act 2006 is sufficient to ensure the validity and reliability of the scientific evidence used in criminal trials in Aotearoa New Zealand. It reviews Section 25 of the Evidence Act 2006 (EA) in light of Shepherd v R,³⁹ which admitted as reliable a novel scientific technique involving facial mapping, and AM v R,⁴⁰ which excluded expert evidence purporting to assess the video tape of a witness interview to determine veracity as insufficiently scientifically valid. It identifies gaps in the current legal regime for screening forensic science evidence.

International literature

The exoneration of hundreds of defendants convicted by faulty forensic science, studies revealing a lack of validity in many forensic disciplines and crime-lab scandals have led to an international call by scientists and scholars for better regulation of the admission of scientific evidence in criminal trials.

United States

In 1998, Michael Saks examined the performance of the courts in evaluating expert forensic identification evidence.⁴¹ He concluded that the law's regulation of purported scientific evidence was ‘riddled with contradictions, confusion and chaos’.⁴² He found that courts abdicated their intellectual and institutional duty to decide the validity of empirical claims in deference to the ‘authority’ of the very witnesses whose offerings were to be evaluated.⁴³ He found that forensic witnesses could evade Daubert's reliability requirements and that courts mistook consensus for validity.⁴⁴

In 2000, Michael Risinger examined criminal defendants’ Daubert reliability challenges to expert evidence offered by the prosecution.⁴⁵ He found that, while civil defendants won their Daubert challenges most of the time, criminal defendants almost always lost theirs, including when they challenged types of scientific evidence like syndrome evidence, bitemark comparison and handwriting comparison that have subsequently been discredited.

In 2007, Paul Giannelli examined forensic laboratory failures in the United States.⁴⁶ He found that there were several recurring issues across the labs, including cognitive bias, the lack of contemporaneous record keeping, poor reporting and experts testifying beyond the contents of their reports.⁴⁷

In 2008, David Faigman surveyed the ‘anecdotal forensic sciences’, which he defined as forensic analyses based on inductive experience to develop and test their hypotheses, which contained a substantial degree of subjective judgment in application.⁴⁸ Faigman described these disciplines, which were both experience-based and subjective, as ‘particularly noxious to the truth’ because they were ‘largely invulnerable to falsification’.⁴⁹ These anecdotal disciplines included pattern-matching analyses like fingerprint, firearm, toolmark, bitemark and hair comparison, as well as arson investigations based on fire-pattern analysis.⁵⁰ Faigman found that many subjects of forensic identification had received little to no sustained research attention but rather were based on anecdotal experience and supposition.⁵¹ He found that anecdotal forensics employed superficially objective methods to support their subjective judgments and were largely bereft of intellectual content.⁵² He also found that courts continued uncritically to admit anecdotal forensic expert testimony despite its questionable validity.⁵³ He noted how judges lacked scientific sophistication and were crippled by embarrassment and fear of the consequences of excluding forensic evidence due to concerns with its reliability because of the length of its historical pedigree and the potential that rigorous scrutiny could call previous convictions into question.⁵⁴ He recommended that courts prohibit forensic experts from testifying to the conclusion of an individualised identification of a reputed source of a piece of evidence.⁵⁵

In 2009, the National Research Council of the American National Academy of Sciences (NRC) issued its comprehensive report, Strengthening Forensic Science in the United States, harshly criticising the scientific underpinnings of forensic-science evidence routinely offered in courts in the United States.⁵⁶ The NRC encountered serious problems across the forensic sciences and expressed doubts about the evidentiary value of many forensic science techniques that were in regular use both in the United States and globally, particularly the traditional pattern matching comparisons. The NRC Report clearly established that many problems, irregularities and miscarriages of justice stemming from poor forensic practice could not simply be attributed to a handful of rogue analysts or underperforming laboratories but rather were systemic and pervasive – the result of factors like fragmentation (including disparate and inadequate training and educational requirements, resources and laboratory capacities), lack of standardisation of disciplines, insufficient high-quality research and education, and a dearth of peer-reviewed studies establishing the scientific basis and validity of many routinely used forensic methods. The report found that shortcomings in the forensic sciences were especially prevalent among the feature-comparison disciplines, many of which lacked well-defined systems for determining error rates or studies to establish the uniqueness or relative rarity or commonality of the particular features being compared. The report also found that proficiency testing, when it was conducted, showed poor performance by specific examiners. The report concluded that ‘much forensic evidence – including, for example, bitemarks and firearm and toolmark identifications – is introduced in criminal trials without any meaningful scientific validation, determination of error rates, or reliability testing to explain the limits of the discipline.’

In 2009, Brandon Garrett and Peter Neufeld explored the forensic-science testimony by prosecution experts in the trials of innocent people convicted of serious crimes who were later exonerated by post-conviction DNA testing.⁵⁷ They reviewed the trial transcripts of 156 exonerees, which contained testimony from 72 forensic analysts employed by 52 different laboratories. The forensic evidence involved in the wrongful convictions included microscopic hair comparison, bitemark comparison, shoeprint comparison, soil comparison, fibre comparison and fingerprint comparison. The study found that forensic analysts called as prosecution witnesses provided invalid testimony in 60% of the trials by testifying to conclusions that were based on either mis-stated or non-existent empirical data. It found that the adversarial judicial system largely failed to regulate the invalid testimony. It also found that, in the few cases in which the defence challenged invalid forensic science, judges rarely provided relief.

In 2011, Brandon Garrett conducted a systematic audit of the first 250 American DNA exonerees.⁵⁸ He documented the prevalence of unreliable and invalid forensic science in most of the exoneree's original convictions.⁵⁹ He found that many of the wrongful convictions were based on forensic sciences that involved the subjective comparison of items of evidence to determine whether they were ‘similar’, ‘consistent’ or ‘matched’.⁶⁰ He also found that judges failed to act as appropriate gatekeepers when forensic experts deployed unreliable methods or exaggerated their conclusions.⁶¹ He documented how specific disciplines played an outsized role in the wrongful convictions, including microscopic hair comparison, bitemark comparison, shoeprint comparison, voice comparison and fingerprint comparison.⁶²

In 2012, an Expert Working Group on Human Factors in Latent Print Analysis (EWG) established by the National Institute of Standards and Technology and the National Institute of Justice issued their report, Latent Print Examination and Human Factors: Improving the Practice Through a Systems Approach.⁶³ The EWG recommended against expert witnesses presenting latent fingerprint identifications as meaning that the source of an exemplar (e.g. the suspect) is the source of the latent print to the exclusion of all other individuals in the world.⁶⁴

In 2015, Barack Obama tasked the President's Council of Advisors on Science and Technology (PCAST), an advisory group of the leading American scientists and engineers, with determining whether there were additional steps that forensic scientists could take, to respond to the NRC Report to ensure the validity of forensic evidence used in the American legal system. In 2016, PCAST issued its final report on the state of forensic science evidence in American criminal courts, Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods. PCAST concluded that there was a need for clarity about the scientific standards for the validity and reliability of forensic methods generally and the forensic ‘feature-comparison’ methods specifically, including DNA, bitemark, latent fingerprint, firearm, footwear and hair comparison. They concluded that many common forensic disciplines lacked both foundational⁶⁵ and as-applied validity.⁶⁶

In 2018, Paul Giannelli examined the failure of the justice system effectively to screen unreliable forensic science by reviewing discredited techniques (bitemark analysis, microscopic hair comparison, fire pattern evidence and comparative bullet lead analysis) and techniques that have been misleadingly presented (firearm and toolmark identification and fingerprint examinations).⁶⁷ He found that the system had failed to demand empirical testing and properly evaluate foundational research.⁶⁸ He concluded that the justice system ‘may be institutionally incapable of applying Daubert in criminal cases’.⁶⁹

United Kingdom

In 2011, the Law Commission for England and Wales examined the admissibility of expert evidence in criminal proceedings.⁷⁰ The report found that the judicial approach to the admissibility of expert evidence in England and Wales was ‘laissez-faire’ and that too much expert opinion evidence was being admitted without adequate scrutiny. It concluded that miscarriages of justice were occurring due to unreliable prosecution evidence.

Australia

In 2008, Gary Edmond documented the failure of the jurisprudence governing the reception of expert opinion evidence in criminal trials in New South Wales.⁷¹ He canvassed the leading admissibility decisions under the New South Wales Evidence Act 1995 interpreting the admissibility of expert opinion evidence.⁷² He found that the State had been able to secure the admission of incriminating expert opinion evidence of unknown reliability, notwithstanding the statutory rules designed to regulate the admissibility of expert evidence.⁷³ He critiqued the case of R v Jung,⁷⁴ which is strikingly similar to Shepherd in Aotearoa New Zealand.⁷⁵ He concluded that judges needed to intervene to exclude potentially unreliable scientific evidence, particularly when testing and validation studies had not been performed.⁷⁶

In 2012, Edmond and Mehera San Roque surveyed the concerns and limitations of traditional adversarial safeguards to manage incriminating forensic science.⁷⁷ They concluded that courts needed to adopt a more exclusionary orientation towards incriminating forensic science evidence and develop new mechanisms to manage its use during criminal proceedings.

In 2014, Edmond et al reviewed the threat posed by forensic science evidence being produced in conditions in which analysts are exposed to unnecessary contextually biasing information.⁷⁸ They found:

In practice, most jurisdictions liberally admit incriminating expert opinion evidence. This lax approach applies even when the evidence is derived from techniques that have not been formally evaluated and where the analyst is not shielded from domain-irrelevant information. Laxity is pervasive in jurisdictions that (purport to) direct attention to reliability following Daubert v Merrell Dow Pharmaceuticals,Inc. (eg US federal courts and Canada), as well as those directing attention to the existence of a ‘field’, the qualifications or experience of the witness, and the utility of the evidence (eg England and Wales, Australia and New Zealand).⁷⁹

They explained:

At trial, the (often impecunious) accused is not in a good position to engage in methodological debates in order to elucidate real risks that may be trivialized by prosecutors, expert witnesses, jurors and even judges. Moreover, continuing admission and legal indifference mean that these issues will need to be re-canvassed in trial after trial rather than being addressed systematically by the institutions responsible for generating the evidence.⁸⁰

Aotearoa New Zealand

In 2020, Gary Edmond conducted a detailed review of legal challenges to latent fingerprint evidence in reported judgments in Aotearoa New Zealand over the past century.⁸¹ He concluded that the judgments showed ‘remarkably little epistemological engagement’ of scientific research in assessing the admissibility of fingerprint evidence.⁸² He concluded that the New Zealand courts were ‘not good at regulating the admission, presentation and evaluation of forensic science evidence’.⁸³

In 2024, I documented how the New Zealand courts appeared to continue to engage in uncritical acceptance of under-validated scientific techniques that were increasingly coming under scrutiny in other parts of the world.⁸⁴ I concluded that the lack of validation studies demonstrating the reliability of the forensic analyses routinely used in Aotearoa New Zealand threatened the accuracy of the criminal-justice system.

Validation studies

Some of the techniques that have been criticised in earlier studies (e.g. latent fingerprint and shoeprint analyses) have undergone preliminary studies of their foundational validity.⁸⁵ For example, in 2011, the FBI commissioned a large-scale black-box study⁸⁶ to examine the accuracy and reliability of forensic fingerprint comparisons documenting the low false-positive rates of the FBI's fingerprint technique.⁸⁷ Commentators have criticised these studies as inadequate to determine the validity of conclusions about whether a latent print was made by a given individual or the probative value of the limited feature sets observed in many lower-quality latent prints.⁸⁸ There have been some promising efforts to develop quantitative methods for estimating the probative value or weight of fingerprint evidence,⁸⁹ but they have not achieved as-applied validity yet.⁹⁰ It is important to note that the print comparisons in these studies were not performed under laboratory conditions and the studies were not test blind (i.e. examiners knew that they were being tested), both limitations on their ecological validity.

Methodology

This study endeavours to quantify the failures of the New Zealand courts adequately to screen under-validated forensic-science evidence by documenting:

the types of forensic-science being offered in criminal cases;

the party presenting the evidence;

the experts’ affiliations;

how often there was a challenge to the admissibility of the expert evidence and the timing of any challenge in the proceedings;

how often experts rely upon the uniqueness assumption;

how often experts testify to an individualised identification or ‘match’ of a source of forensic evidence;

how often experts testify that the forensic discipline had an error rate of zero;

how often evidence relating to human factors like cognitive biases and observer effects was introduced; and

how often experts engaged in fallacious statistical reasoning in presenting their results and conclusions.

Forensic science terms

Published cases⁹¹ involving forensic evidence were identified through a search of the three major databases of published criminal cases in Aotearoa New Zealand: the Westlaw CRNZ, Lexis Advanced, and New Zealand Legal Information Institute (NZLII) databases for cases involving forensic science testimony. To identify cases with forensic evidence, these databases were searched using Boolean searches for terms related to common forensic science specialties.⁹² Because of the international nature of much forensic science research and practice, the search terms included both North American and British/New Zealand spellings and terminology. The search deployed acronyms when they were more commonly used than full terms (e.g. ‘DNA’ rather than ‘deoxyribonucleic acid’).

These search terms are contained in Appendix A. If search terms are in quotation marks in the Appendix, this indicates that the search was performed using quotation marks, so that search results would include only cases in which the precise terms were used in their entirety, per Boolean search rules.

Because a Boolean search for a partial phrase will also generate results with a longer phrase that contains the partial phrase, but a Boolean search for the full phrase will not generate results containing the partial phrase, short or partial phrases were often used (e.g. ‘print examiner’, which would also find cases containing the phrase ‘fingerprint examiner’). At other times, a longer phrase was used because a shorter phrase would likely have returned hundreds of irrelevant results (e.g. ‘firing pin’ rather than ‘firing’ or ‘validity’ rather than ‘valid’). Many individual cases came up in the search in response to multiple search-term queries. Many of these search terms also yielded no results (e.g. ‘IAFS’ or ‘infrared microscopy’).

Important precedents involving the regulation of scientific evidence

In addition to searching these databases for common forensic science terms, three important precedential cases were also Shepardized⁹³ to locate cases in which scientific evidence was challenged or where the proponent felt it necessary to establish that the evidence complied with the validity and reliability requirements of EA s 25. Those three cases were:

Daubert v Merrill Dow Pharmaceuticals ⁹⁴

Lundy v R ⁹⁵

Shepherd v R ⁹⁶

Resulting dataset and coding

The result of these queries was a dataset of 148 published cases. The cases were coded for the following variables:

case name and citation

type of forensic science⁹⁷

the party presenting expert evidence

the expert's affiliation

whether there was a challenge to the admissibility of the expert evidence and the timing of any challenge in the proceedings

whether the expert relied upon the uniqueness assumption⁹⁸

whether the expert testified to an individualised identification or ‘match’ of a source of forensic evidence⁹⁹

whether the expert testified that the forensic discipline had an error rate of zero¹⁰⁰

whether there was evidence relating to human factors like cognitive biases and observer effects¹⁰¹

whether either the expert or the proponent of the evidence engaged in the anterior probability fallacy (the so-called ‘prosecutor's fallacy’)¹⁰²

Limitations

Due to the inability to access Police, ESR, or court records, this study was limited only to an analysis of published judgments and did not include the underlying expert evidence. As a result, it is not possible to verify through triangulation the descriptions of expert evidence contained in published judgments by comparing those descriptions to the underlying evidence (written export reports or live testimony). A study that is limited only to published judgments cannot detect even well-documented problems in forensic science if they are not apparent from published judgments unless they happen to have been raised, extensively litigated by the parties, and formed a part of the judgment – for example, exposure to domain-irrelevant, contextually biasing information¹⁰³ and the absence of blinding protocols during test performance.¹⁰⁴

An additional limitation arises from the nature of the process of publishing judgments. Because appellate judgments are published more often than interim judgments relating to admissibility hearings, and because judgments have generally become more publicly available over time, databases of published judgments tend to be disproportionately appeals and relatively recent.

The study defined ‘forensic science’ narrowly to include physical sciences but not forensic medical examination or forensic psychology/psychiatry evidence, not because these areas of forensic evidence cannot or do not have reliability problems, but rather because they tend to have different types of reliability issues than the more traditional pattern-matching disciplines that have been the focus of critiques internationally. The reliability of forensic pathology or forensic psychiatric testimony are topics worthy of study in their own right but not because they make the types of claims of individualisation that the physical forensic sciences do.

Findings

Common types of forensic evidence

The study examined the use of 30 common forensic disciplines. It also coded for ‘other’, if a case involved a type of forensic science not anticipated by researchers. The list of type codes is contained in Appendix B. Many cases involved multiple types of forensic science evidence and were coded accordingly.

The most common types of forensic evidence were traditional DNA testing, handwriting analysis, gas chromatography/mass spectrometry, palm and footprint¹⁰⁵ comparison, fingerprint comparison, blood spatter pattern analysis and firearm/fired bullet comparison.¹⁰⁶ Less common techniques included Y-STR DNA testing, low copy number DNA analysis, complex DNA mixture analyses and mitochondrial DNA comparison. Other forms of trace/microscopic evidence analysis were infrequent, including glass comparison (three cases), hair comparison (two cases),¹⁰⁷ fibre comparison (two cases), toolmark comparison (two cases) and facial mapping (two cases). There were several common forensic disciplines that we could not find expert evidence relating to in the dataset: microscopic paint comparison, tyremark comparison, forensic odontology/bitemark comparison¹⁰⁸ and lie detection evidence.

Several of the common forensic disciplines in the criminal justice system in Aotearoa New Zealand have been the subject of critique and criticism internationally. For example, PCAST found that there were no appropriate studies to establish the foundational validity of shoe print comparison and that expert claims to have identified a shoe associated with a particular impression were ‘unsupported by any meaningful evidence or estimates of their accuracy’.¹⁰⁹

Party offering forensic evidence

The prosecution offered forensic evidence in far more cases than the defence. There were 95 cases in which only the prosecution offered forensic evidence, 50 cases in which both the prosecution and defence offered forensic evidence, and only two cases in which the defence alone offered forensic evidence.

Expert affiliation

By far the most common source of expert evidence was from either ESR/DSIR¹¹⁰ (68 cases) or an in-house forensic unit at the New Zealand Police (42). Other common sources of expert evidence were from academic experts (35) or domestic private forensic consultants (31). There were also 30 cases in which the expert was a foreign expert imported from abroad, presumably in cases in which there is insufficient domestic expertise or capacity in Aotearoa New Zealand in the particular discipline. These expert affiliations tended to align with the party presenting the evidence, with the evidence from ESR and Police experts being offered almost exclusively by the prosecution and evidence from academic experts and private consultants being offered more often by the defence.

Defence challenges to forensic evidence

In 37 cases, the defence never challenged the reliability or admissibility of the prosecution's forensic evidence. In the remaining 111 cases, the defence challenged the evidence prior to trial in 30 cases, at trial in 3 cases, and on appeal in 78 cases.

Source attribution: Individualisation and uniqueness

In 106 cases, judgments describe the forensic expert as testifying either to the uniqueness assumption (testimony to the effect that the crime-scene evidence and suspect exemplar are so unique that no two sources could have left behind indistinguishable evidence at the scene) or to an individualised match determination (testimony to the effect that a suspect item of evidence matches a known exemplar to the exclusion of any others, such that the suspect item of evidence must have been created by the known source).

Error rates¹¹¹

In none of the 148 cases in the dataset did the judgment describe the expert as testifying that their technique or conclusions had an error rate of zero. This finding is reassuring, given that ‘zero error rate’ testimony has historically plagued forensic science in other jurisdictions.¹¹²

Statistical fallacies

Three of the 148 cases involved forensic evidence that the judgment suggested either as containing a statistical fallacy or being used or interpreted by the party offering the evidence (e.g. in final address) in a manner that evidence an error in statistical reasoning.

Analysis of significant findings

Common disciplines

The ongoing prevalence of expert evidence relating to latent fingerprint analysis and firearms analysis in the dataset is concerning given the widespread international criticisms of their validity. Regarding latent fingerprint analysis, PCAST noted: ‘The method was long hailed as infallible, despite the lack of appropriate empirical studies to assess its error rate.’¹¹³ It noted that latent fingerprint analysis had ‘a false positive rate that is substantial and is likely to be higher than expected by many jurors based on longstanding claims about the infallibility of fingerprint analysis’.¹¹⁴

Timing of defence challenges

The data suggest that, in the majority of cases in which the defence challenged the admissibility of forensic-science evidence offered by the prosecution, the challenge only occurred on appeal. This is consistent with findings internationally about the failure of defence lawyers effectively to challenge potentially unreliable forensic-science evidence.

In the United States, Garrett and Neufeld found that defence counsel rarely cross-examined analysts about invalid testimony or obtained independent experts.¹¹⁵ In the United Kingdom, the Law Commission found that lawyers did not cross-examine experts effectively to reveal potential flaws in experts’ methodology, data and reasoning.¹¹⁶

In Australia, Edmond et al (2014) note: ‘In practice, it becomes the responsibility of the defence lawyer to identify, explain and successfully convey evidentiary limitations with expert evidence to the trier of fact (and an envisioned court of appeal, should the accused be convicted).’¹¹⁷ They explain: ‘Because of the way trials are resourced and operationalized, “in practice”, trial safeguards and protections have not made trials featuring incriminating expert evidence fair, nor have they placed the trier of fact in a position conducive to the rational assessment of the expert evidence.’¹¹⁸

These failures are likely a function, at least initially, of the inadequacy of legal aid funding for defence investigation and independent experts, particularly in conjunction with the high caseloads that tend to result from chronic underfunding. I have previously written about the way that inadequate funding and high caseloads contribute to ineffective assistance of counsel in the United States by combining to create poor management and culture in public-defence organisations.¹¹⁹ I have documented how ineffective assistance of counsel tends to be a long-term problem that develops through three stages.¹²⁰ At the first stage, political entities underfund legal aid when appellate courts fail to hold the system to strict standards of defence effectiveness.¹²¹ In the second stage, after underfunding and overloading defence lawyers has continued over time, defence lawyers institutionally lack any awareness of what adequate funding, reasonable caseloads, and zealous representation even look like.¹²² In the third stage, a culture of defensiveness sets in, which leads to rationalisations about the futility of fighting for structural reform and an acceptance of poor standards of practice.¹²³ Likely, similar phenomena have occurred in Aotearoa New Zealand, preventing defence lawyers in a majority of cases in which questionable scientific evidence is presented by the prosecution from challenging it prior to appeal.

Source attribution

Examinations of features for the purpose of identification can lead to inclusions and exclusions. An inclusion increases the probability that a trace originated from a particular source within that set, and an exclusion decreases this probability to essentially zero.¹²⁴ In the context of forensic analysis, ‘identification’ refers to the association, to some degree of probability, of a trace to a source.¹²⁵ One type of identification is the attribution of a trace to a single source (a source attribution).¹²⁶

The classical theory of pattern-matching forensic disciplines is that individualisation is possible regardless of how many possible sources there are for an item of forensic evidence.¹²⁷ The theory holds that a well-trained examiner can ascertain when any item of forensically significant evidence displays enough features to distinguish it from every sufficiently complete and clear impression of any other possible source.¹²⁸ This is a theory of ‘universal individualisation’ based on a premise of ‘general uniqueness’.¹²⁹ The theory of universal individualisation claims that a single individual must be the source of an item of trace evidence because no other individual in the world could have produced such a trace – that is, that the individualisation rules out every other possible source of the evidence.¹³⁰

In practice, a universal individualisation means that the examiner is confident that if impressions from any other possible source could be compared with the crime-scene evidence, not one would match.¹³¹ The individualising examiner effectively sets the size of the population of possible sources to its maximum regardless of the specific circumstances of the case.¹³² The individualisation process moves from this maximum initial population of possible sources to a source determination.¹³³ At the end of the examination process, the quantity of features observed in agreement between two objects (without significant discrepancies) is perceived as so impressive that the examiner has ruled out the possibility of a coincidental match, whatever initial population of sources was involved.¹³⁴ It means that the identification of the source is to the exclusion of all other sources.¹³⁵

While the results of this study indicate that forensic scientists in Aotearoa New Zealand do not (or no longer) testify to ‘zero error rates’ in their forensic techniques, they do continue to make findings of individualisation based on the untested and unproven assumption that a particular type or number of features established a unique and detectable signature.¹³⁶ There are no validation studies to support the claims that the probability that an item of crime-scene evidence was made by another, different source is practically zero or that a match between a crime-scene impression and a suspected item of evidence can be individualised to the exclusion of all other sources.¹³⁷ PCAST specifically recommended that courts prohibit forensic examiners from testifying that they could identify the source of an item of crime-related evidence to the exclusion of all others or to a degree of scientific certainty.¹³⁸

Prosecutorial discretion

It appears that prosecutors in Aotearoa New Zealand regularly offer expert evidence from forensic disciplines that lack either foundational validity and/or as-applied reliability. In 2016, PCAST recommended that prosecutor and forensic scientists in the United States ‘should not offer testimony based on’ any forensic method that lacked empirical studies and/or statistical models to establish its accuracy.

Recommendations

Courts should require a demonstration of foundational validity and as-applied reliability as a precondition to the admissibility of any purported scientific evidence. This validity should be established through black-box testing and validation studies to demonstrate the reliability of the techniques or the process of deduction,¹³⁹ which generate established rates of error and are published in peer-reviewed academic journals. Validity and reliability also require rigorous proficiency testing and the disclosure of test results as part of routine pretrial discovery processes. This disclosure should include the known rates of error.

Measuring the incidence of errors serves three functions.¹⁴⁰ First, studying error is an integral part of science. A basic tenet of experimental science is that errors and uncertainties exist that must be reduced by improved experimental techniques and repeated measurements, and those errors remaining must always be estimated to establish the validity of results.¹⁴¹ Second, measuring and tracking error rates is part of a comprehensive quality control and assurance system.¹⁴² Third, in the legal system, quantified error rates are a consideration in judging the admissibility of findings or the weight that should be given to them.¹⁴³ Quantified error rates not only can lead to improvements in the reliability and validity of current practices but could also assist in more appropriate use of the evidence by fact-finders.¹⁴⁴

Law reform could be accomplished in one of two ways. Parliament could amend EA s 25 explicitly to require foundational validity and as-applied reliability as part of the ‘substantial helpfulness’ requirement for expert opinion evidence. For example, after the Supreme Court of the United States issued its decision in Daubert, the United States Congress amended the Federal Rules of Evidence to codify some of its requirements by explicitly requiring the party offering expert evidence to demonstrate that the expert's opinions are ‘based on sufficient facts or data’, “’he product of reliable principles and methods’, and ‘reflects a reliable application of the principles and methods to the facts of the case’.¹⁴⁵

In the alternative, courts could adopt an explicit validation requirement in their interpretation of existing EA s25. In cases like Shepherd and Lundy, courts appear to be washing their hands of the core task of testing the reliability of evidence offered under s 25, delegating to the forensic science community the role of accepting (or rejecting) a technique for itself. In particular, the judicial attitude about the ‘inexpert judge or jury’ in Lundy suggests that reliability is a matter to be decided by forensic practitioners before expert evidence reaches the courts and that the courts cannot look behind the decision. Additional training and resourcing for defence counsel may also enhance defence challenges and therefore enhance judicial determinations of admissibility.

Until appropriate validation studies have been completed, courts should impose a moratorium on the admission of forensic evidence based on pattern-matching disciplines that lack sufficient validation studies to establish both the foundational validity of their underlying methods and the as-applied accuracy of the particular technique and examiner. Courts should not permit forensic examiners to testify that they can identify a source of crime scene evidence to the exclusion of all other possible sources on the basis of a subjective determination that the identified source shares sufficient characteristics with a crime-relevant impression or item of evidence.

Future research directions

This research does not attempt a qualitative evaluation of the depth and sophistication of defence challenges to potentially unreliable forensic evidence offered by the prosecution due to the limited nature of the reported cases from which it drew its data. It also does not attempt a qualitative evaluation of the exercise of prosecutorial discretion in producing and presenting forensic evidence, particularly relating to pattern-matching disciplines that currently lack validation studies, known error rates and a track record of academic peer review. Both of the areas of research are needed for a fuller picture of the way that the adversarial system of justice in Aotearoa New Zealand intersects with the reliability concerns about some types of forensic evidence documented in this study. As Faigman (2008) notes: ‘Most lawyers have little training in the methods of science, and criminal defense lawyers-those charged with examining this testimony in court-often have overwhelming caseloads and undersized budgets.’¹⁴⁶

Conclusion

The admissibility standards for forensic-science evidence in Aotearoa New Zealand are too lax, and courts have been too permissive in accommodating the Crown's expert evidence. Forensic analysts in Aotearoa New Zealand routinely employ many of the techniques and practices identified in the NAS Report, the PCAST Report and in scholarly critiques abroad.

Moving forward, Aotearoa New Zealand should develop a more robust domestic evidence law around screening expert scientific testimony for sufficient foundational evidence of reliability, particularly considering the documented relationship between faulty scientific evidence and false convictions abroad.

Reforming forensic science here is important because of its established role in wrongful convictions internationally.¹⁴⁷ The failure of courts, lawyers and forensic scientists to screen and validate scientific testimony has been linked to hundreds of false convictions.¹⁴⁸ The time has come to restrict the admissibility of forensic science until methodologically sound experimental research confirms (or dispels) its validity and reliability.¹⁴⁹

Footnotes

Author’s note

This research was published with the assistance of a grant from the Michael and Suzanne Borrin Foundation. The author is deeply grateful to her team of research assistants: Rayhan Langdana, Ella Maiden, Christopher McCardle, and Ben Christy.

Declaration of conflicting interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the New Zealand Law Foundation / Borrin Foundation.