Abstract
This article addresses a serious, but currently unacknowledged, problem of evidential consistency regarding police-suspect interview evidence. It sheds light on flaws in current criminal procedure through the lens of linguistics, focusing on key stages of currently accepted practice which fly in the face of what linguists have long known about language. It demonstrates that, in stark contrast to the strict principles of preservation applied to physical evidence, interview data go through significant transformation between their creation in the interview room and their presentation in the courtroom, especially through changes in format between written and spoken text. It argues that, despite the safeguards provided by PACE 1984, there is nonetheless a level of routine distortion and contamination unintentionally built into the current system of presenting police interviews as evidence in England and Wales.
Introduction
Police investigative interviews are a vital part of the criminal justice process: they are an essential element of the investigative evidence-gathering process, and in many jurisdictions will also go on to form an important piece of evidence in court. The formal records of these interviews are therefore of great significance. In terms of how these records are used, in the England and Wales (E&W) context the legal framework is such that the credibility of a witness can be destroyed by counsel highlighting differences between what is said in court, and what was (recorded as being) said at interview. In a skilful cross-examination this can discredit their entire evidence, not just the often minor part which is (apparently) inconsistent. The effect can be devastating, especially for defendants, and so the accuracy of interview records is therefore crucial.
Yet despite this, there are real causes for concern in the current treatment of E&W interview data and the methods by which interview records are produced. I will argue that the data are (unintentionally) distorted and misinterpreted as they pass through the criminal justice system. In stark contrast to the strict principles of preservation applied to physical evidence, this article will show that interview data go through significant alteration and contamination along the route from interview room to courtroom. They undergo various transformations in format, being converted between spoken and written modes and subject to various other processes along the way. Troublingly, the legal system treats all the different versions as unproblematic ‘copies’ of the original. This article will critically examine this process, highlighting the serious implications in terms of interference with criminal evidence; something which is currently entirely unrecognised in the criminal justice system.
To illustrate the issues under discussion, this article will focus on E&W police interviews with suspects. Procedures vary across other jurisdictions and data types, but the principles discussed here, and the problems highlighted, have much broader relevance and applicability.
Data and method
The work presented here draws on three distinct strands and data sources, in order to provide a thorough analysis grounded in both academic theory and professional practice. It thus incorporates multiple perspectives, both internal and external to the process being scrutinised. The initial starting point was the author’s previous experience as a barrister in E&W, practising criminal law for both prosecution and defence. Barristers gain direct insight into both sides of the criminal justice process, thus acquiring a professional understanding of what is required for the police and Crown Prosecution Service (CPS) to build a sufficiently robust case to the required evidential standards, along with practical experience of dismantling the same. It was my experience of presenting transcripts of police interviews in court as part of the prosecution case, and professional concern over the integrity of this process, which led to investigating this further from a theoretical, academic perspective.
Second, it analyses data in the form of audio recordings and transcripts of police-suspect interviews. This dataset includes both the official police transcripts and my own transcripts of the audio data. The majority of these were collected from several police forces as part of a wider research project (Haworth, 2009), 1 with permission granted to use these data for research purposes. All personal or identifying material has been anonymised, and only data from closed cases was collected, to avoid any possibility of interfering with ongoing proceedings.
This dataset also includes data from the Harold Shipman case in 1999–2000. Due to the high-profile nature of his trial for murdering 15 of his patients, audio recordings of two of his police interviews were made available to the press. 2 In addition, the subsequent public inquiry 3 made available a full transcript of the lengthy trial, 4 including the parts where these interviews were presented to the court as evidence. This therefore presents an unusual opportunity to observe the same police interview data being recontextualised at different stages of the criminal justice process (although that is not a primary objective here—for deeper analysis of this specific dataset, see Haworth, 2006, 2010).
For the third strand, in order to gain insight into professional practice regarding interview transcription, a visit was arranged to a group of transcribers (known as ‘ROTI 5 clerks’) within a police force with whom I already had research contact. They had agreed to assist our Masters students with a research project, and I took the opportunity to conduct an informal focus group discussion with the clerks about their work, alongside ethnographic observation of their working environment. This particular pool of ROTI clerks is based within a police divisional headquarters, with their working environment being one area of a large open-plan office for police personnel. Apart from being physically positioned slightly separately from other staff, and not wearing police uniform, the clerks clearly identified, and are treated, as police employees. It is difficult to ascertain how typical this set-up is across other police forces, and so this is included to provide context, not as a description of standard practice. However, it appears that ROTI clerks at other forces are also routinely police employees. (It no longer seems to be practice anywhere for interviewing officers to transcribe their own interview recordings. 6 )
Also present was the senior police officer who had facilitated our visit. He is a highly experienced interviewer and interviewer trainer, and was already known to at least some of the ROTI clerks due to having been involved in their training many years previously. However, he did not appear to have any direct role in their work in terms of job oversight or line management. Students were also present for the latter part. This group discussion was audio-recorded and transcribed, and this forms a further dataset to be drawn upon in this article. Consent was obtained from all participants to record the discussion, and for the publication of anonymised excerpts.
It should be emphasised that the interview transcripts presented here were collected from a different police force to the ROTI clerks who took part in the discussion group. Thus the extracts presented here are not their work. Data are deliberately included from various sources in order to demonstrate that the issues being highlighted are by no means limited to one force or local practice. It was also considered important not to hold up the work of those individuals for scrutiny and criticism, after having generously given up their time to offer a rare insight into this overlooked and underestimated task.
My transcripts
In order to present my data here, it was necessary to subject it to the same processes which are under scrutiny in this article, namely to convert spoken text into a written format which can be easily shared and used for the intended purpose. In a further step away from the primary (spoken) source, in order to present data from the courtroom it was necessary to rely on transcripts already produced by others, and with a different purpose in mind. This neatly illustrates why the ‘politics of transcription’ (Bucholtz, 2000) have exercised linguists for such a long time. The approach taken here is to produce transcripts which include the level of technical detail appropriate to their particular purpose (following e.g. Cook, 1990; Lapadat, 2000). Thus, my transcripts of interview audio data include details such as marking the precise period of overlapping talk, timing pauses and similar, whereas my transcripts of the group discussion with ROTI clerks are much simpler, since such features are not analytically relevant there. The aim throughout is to present enough transcription detail to support the point being made without adversely affecting readability. It is fully recognised that this is in itself an interference with the data, and that entirely different decisions could have been made. Indeed, to many linguists these transcripts will seem unacceptably ‘light’. However, the intention is that this approach to some extent leads by example, showing how different approaches to transcription can be used to match up to the practical realities and user requirements of the system under discussion.
Background and context
We will begin by setting out the underlying context for E&W police-suspect interview evidence. This is an important first step as different jurisdictions have widely different approaches to conducting and recording police interviews, and in fact the E&W system is often regarded internationally as being a leader in this respect. It is also essential that any critique of the process properly takes into account the legal framework which underpins it.
Police and Criminal Evidence Act 1984 (PACE)
The most important piece of legislation in the E&W police interview context is the Police and Criminal Evidence Act 1984 (PACE), which brought about wholesale changes in police procedure. It was introduced partly in response to a series of high profile miscarriages of justice, including several cases in which evidence of police interviews with suspects had been corrupted, or indeed altogether fabricated. The reputation of the police force, and public trust in its integrity, were at a low point. It was recognised that there needed to be fundamental change in the way the police conducted themselves, and a Royal Commission was set up. As Brown reports: [PACE] is the direct outcome of the Royal Commission on Criminal Procedure’s (RCCP) recommendations for systematic reform in the investigative process. The provisions of the Act are designed to match up to principles of fairness (for both police and suspect), openness and workability. Overall, they are intended to strike a balance between the public interest in solving crime and the rights and liberties of suspects. (Brown, 1997: ix)
Prior to PACE the audio-recording of interviews had been the subject of much debate, and fierce resistance by the police. In 1985, Baldwin commented on ‘the intransigent opposition to the idea that has been evident for many years in all levels of the police service’ (1985: 695–696). But he also observed an ‘extraordinary volte-face on the part of the police service on the tape recording question’ (1985: 695) at that time. He cites several reasons for this marked shift in favour of the use of tape-recording, including the results of successful field trials. Many of the fears which had been expressed in police circles, such as suspects being less willing to talk, failed to materialise, and, perhaps more significantly, it was observed that ‘tape-recording is rapidly coming to be viewed by officers involved in the field trials as of greater assistance to the prosecution than it is to the defence’ (Baldwin, 1985: 702). The provisions of PACE place a significant onus on the police not only to act fairly, but also to ensure that they are seen to be acting fairly at all times. The introduction of the requirement to audio-record interviews has therefore been extremely helpful to the police in this respect. Indeed, despite the initial resistance, it is now widely regarded within the force as a vital safeguard to protect themselves from accusations of malpractice. In fact it is a widely held view within the legal system that audio-recording has solved all problems with regard to potential corruption or contamination of interview evidence, intentional or otherwise.
However, this focus on the audio-recording of interviews as a tool to avoid both deliberate malpractice and false accusations of malpractice has unfortunately drawn attention away from potential problems with the recording process. Although audio-recording is indeed a successful solution to the original problems PACE was intended to overcome, it nevertheless raises new problems of its own which have not been adequately recognised. Although a vast improvement on prior practice, it nonetheless gives rise to another type of potential corruption of interview evidence: distortion of the interview data through the current process of recording, transcribing, summarising and presenting the data as evidence in court. This process, and the potential for corruption of evidence, are therefore the focus of this article.
Criminal Justice and Public Order Act 1994, s. 34
Another highly relevant piece of legislation is s. 34 of the Criminal Justice and Public Order Act 1994 (s. 34 CJPOA). This significantly altered the way in which interview data are interpreted in the judicial process. The relevant part of s. 34 CJPOA for present purposes is as follows: Where, in any proceedings against a person for an offence, evidence is given that the accused— at any time before he was charged with the offence, on being questioned under caution by a constable trying to discover whether or by whom the offence had been committed, failed to mention any fact relied on in his defence in those proceedings;… being a fact which in the circumstances existing at the time the accused could reasonably have been expected to mention when so questioned,… … the court, in determining whether there is a case to answer; and the court or jury, in determining whether the accused is guilty of the offence charged, may draw such inferences from the failure as appear proper.
In other words, if a suspect fails to mention a ‘fact’ during their police interview, and this fact is later relied upon as part of their defence, the court or jury is entitled to ‘draw inferences’ as to why they did not mention this sooner. It is thus extremely important to ensure that every significant part of a person’s defence is mentioned at the interview stage, in order to avoid potentially triggering the effects of s. 34 CJPOA. But for this provision to operate successfully, it is essential to be able to establish exactly what was said at interview, in order that a valid comparison can be made. This is entirely dependent on the adequacy of the police interview record. The adequacy and accuracy of those records is, however, open to doubt, as we shall see.
Interviews as evidence
Interview data in E&W have an unusual and somewhat challenging dual evidential status: on the one hand they are a means of evidence-gathering, and on the other they form a piece of evidence in themselves. It is this later role as evidence that leads to the rather unique processes undergone by interview data subsequent to their production. In some respects, the story only really begins once the interview itself is over.
Interviews are recorded following the detailed requirements of PACE Code E, 7 which include a requirement to seal a master copy in the suspect’s presence (s. 2.2). Note 2A emphasises that the purpose of this ‘is to establish [the suspect’s] confidence that the integrity of the recording is preserved’, thus displaying a recognition of the importance of data integrity, even if this same principle is then violated with the working copy from this point onwards. The next step is generally for the working copy to be transcribed to produce the official written ‘Record of Taped Interview’ (ROTI). A copy of the ROTI will be sent to the defence—meaning the interviewee’s legal representative if they have one, or the interviewee themselves if not. The defence may also request a copy of the recording.
A primary use of the interview material, of course, is as a key source of information for the police at the time of their initial investigation, and for the defence in advising their client. It will also be used when a decision is made as to whether or not to charge the suspect, and if so, what with. The CPS are generally responsible for the final decision about whether or not a case will be proceeded with, taking into account factors such as whether there is a realistic prospect of conviction and whether it is in the public interest to prosecute (CPS, 2013). The interview forms part of the package of information on which they base such decisions.
Interview data go on to have a further role if the case comes to court. The recording and accompanying ROTI become part of the prosecution case. The contents of the interview are presented to the court as evidence, and are often used in some detail by the prosecution in cross-examination of the defendant. This is the point at which s. 34 CJPOA has its effect, with Prosecution Counsel frequently inviting the court to draw negative inferences from any (apparent) silences or omissions by the defendant during their police interview, and any (apparent) inconsistencies between what they said in court and at interview.
The following example, taken from the cross-examination of the defendant in R v Shipman, 8 demonstrates how much emphasis is often put on the exact words (apparently) used by an interviewee:
Example 1: Shipman Trial Day 33, official court transcript
Do you remember what you told the police about those blood samples?
Which part please?
You told the police, didn’t you, that you drove down to the surgery and delivered the blood samples and you then got on with the surgery?
I am not sure of the word ‘deliver’ but yes I did do that.
No?
If you are happy to say that it is deliver then I will accept it.
Let’s just have a look. We can do it quite quickly and therefore accurately if we have it in front of us and you will not be in any way disadvantaged. Page 22 please?
Yes. Thank you.
Bottom question, bottom answer rather, ‘Well, I drove down to surgery and delivered the blood samples and got on with the surgery.’ You see that?
Yes I do.
Turn 7 demonstrates Prosecution Counsel’s reliance on the written record of the interview, and his complete acceptance of its accuracy. We will now consider whether this scrutiny of the data in such precision and detail at trial is in fact valid. Figure 1 represents the changes in format which interview data undergo from interview room to courtroom.

Format changes of interview data.
First we have the spoken interaction in the interview room. This original version is of course ephemeral and context-bound, experienced only by those immediately present and instantly lost. It is, however, audio-recorded 9 and thus we have its second incarnation in the form of the interview recording. It is important to note that even at this preliminary stage, the data have already changed. Listening to a recording is never the same as being present at the time; all contextual information and cues are already lost. The transcription of this audio then creates a further version in the form of the formal written transcript (ROTI). This is perhaps the most significant change undergone by the data. Yet there is, as yet, no recognition within the criminal justice system that this process causes the data to be transformed at all. Instead, from this point onwards the transcript is relied on almost completely rather than the audio. Indeed at the courtroom stage, although technically the recording is regarded as ‘real’ evidence, 10 transcripts are admissible as ‘copies’, 11 meaning that they can be used as a straightforward substitute, officially sanctioned as interchangeable and (in essence) identical. Yet rather than simply presenting the transcript (or recording) as evidence, standard practice is for the interview transcript to be read out loud in court by a police witness and the prosecutor, thus creating yet another version which involves a further conversion from written to spoken mode.
One further stage to mention, although it will not be discussed here, is the production of the transcript of the court proceedings. This results in the version of the interview data which is read out in court being converted into yet another written version, so a further transformation takes place. (For more on the process of court transcription, see Eades, 1996; Tiersma, 1999: 175–179; Walker, 1986, 1990.)
This whole process is problematic for a number of reasons. First, there are difficulties relating to the recording process; second there is the problem of how to portray spoken language in a written format; third there is the question of editing, as very few interviews are ever transcribed in full; and finally there is the process of converting the data back into a (different) spoken form in the courtroom. Each of these areas will now be discussed in turn.
Problem 1: Audibility
The fact that the recording of police interviews is overt (as opposed to covert surveillance tapes, for example) should mean that there are few difficulties in terms of recording quality. Interviews take place in a quiet, controlled environment, with the recording device prominently situated between participants, all of whom are made aware of the need to express themselves clearly and audibly ‘for the tape’. However, unfortunately such difficulties do arise. Interview recordings are often inaudible in places, or at least unclear. I have even been handed one cassette tape for research purposes, still part of the police case file, which was entirely inaudible. Given that this is a piece of criminal evidence, this is clearly unacceptable, arguably akin to accidentally wiping fingerprints from a crime scene. Yet it certainly does not appear to be treated with a corresponding level of seriousness, especially in less extreme cases where recordings are only partially unclear. Much of this lack of quality and reliability seems to stem from the use of old-fashioned cassette tapes as the recording medium, a format which is now virtually obsolete in all other contexts. Thankfully, these are now in the process of being replaced by digital recording in the E&W police interviewing context, 12 but the switch has not yet fully been made, and many thousands of cassette tapes still exist in evidence files. But even with advances in recording methods, audibility issues still remain.
Leaving aside the practical limitations, it is important to recognise that even with the best-quality audio recording, it is still virtually impossible to create a perfect transcription. Fraser (2003) sets out the aspects of human speech and speech perception which affect our ability to perform this task. She describes the inherent difficulties as follows: The reason for our normally effective perception is that in face-to-face communication we know how to judge the accuracy of our perception, how to question it if it is doubtful, and how to correct it if it is inaccurate. These are exactly the steps that are necessary in creating accurate transcripts. The problem is that in transcribing from a recording we are not in an ordinary communicative situation, with a meaningful context, and the speaker present to correct any important errors. Rather we are abstracted from the real situation… (Fraser, 2003: 216)
Indeed Coulthard and Johnson (2007) cite two striking examples of transcribers ‘hearing what they expected rather than what was actually said’ (2007: 144). In the first, ‘an indistinct word, in a clandestine recording of a man later accused of manufacturing the designer drug Ecstasy, was mis-heard by a police transcriber as “hallucinogenic”…whereas, what he actually said was “German”’ (2007: 144–145). In the other, ‘a murder suspect, with a very strong West Indian accent, was transcribed as saying in a police interview that he ‘got on a train’ and then ‘shot a man to kill’; in fact what he said was the completely innocuous and contextually much more plausible ‘show[ed] a man ticket’’ (2007: 145).
Thus the transcriber adds their own layer of interpretation to the original data, even with a relatively straightforward transcription of uncontentious audio material. And as the quality of the recording drops, the amount of interpretation will increase. The problem is that this ‘tampering with the evidence’ is completely invisible to anyone who subsequently reads the transcript. There is, of course, always the option to listen to the original audio file, but as already noted this rarely seems to happen once an official transcript has been produced, and would seem even less likely if there is no indication of uncertainty in the transcript. Further, if a listener has already read a transcript this may have a priming effect on how the data are heard (see Fraser, Stevenson and Marks, 2011), making errors substantially more difficult to perceive. Contamination has already crept in.
Problem 2: Transcription
The conversion of spoken data into a written format is a highly problematic process, for reasons which extend well beyond the practical difficulties of audibility just discussed. As is well established in the field of linguistics, spoken and written language are fundamentally different modes, which are not directly equivalent (e.g. Biber, 1988; Biber et al., 1999; Halliday, 1989). Conversion from one mode to the other is therefore a process of translation and interpretation, which is necessarily subjective and inexact. Bucholtz therefore cautions that ‘[a]ccuracy is of course an important goal in transcription, but it is also, in the end, an impossible one’ (2007: 789). The challenges of transcribing data have long been addressed as a major methodological challenge by linguists, who themselves need to render spoken data accessible to readers (e.g. Bucholtz, 2000, 2007; Edwards and Lampert, 1993; Leech, Myers and Thomas, 1995; Ochs, 1979; see Jefferson, 2004 for the most commonly followed transcription conventions in linguistic studies of spoken text). Indeed Bucholtz describes transcription as ‘an inherently and unavoidably sociopolitical act’ (2007: 802). Yet as already observed, there is no recognition whatsoever of these issues within the legal system, which has instead chosen to treat recordings and their transcripts as essentially identical pieces of evidence.
In terms of specific problems with the transcription process, Walker, in her study based in part on her own experiences as a court reporter, notes that ‘[o]f all the features that distinguish writing from speech, the one which is potentially the most significant in transcription, is the inability of our writing conventions to express some of the para- and extralinguistic signals that speakers rely on to get their meaning across’ (1990: 208). She gives examples of paralinguistic features such as ‘intonation, breathiness, emphasis, high and low pitch, long, drawn out sounds’; and of extralinguistic features such as ‘raised eyebrows, outflung arms, nods, sneers, and smiles’, which ‘can convey meaning on their own or alter the significance of the words they accompany’ (1990: 208). She goes on to point out that, ‘given that the printed medium is one-dimensional, none of these meaning-bearing contextual components of speech can be represented by using English orthography alone…Without the freedom to go beyond orthography, a sometimes-critical component of communication can fail to be passed along in written form’ (1990: 208).
In the police interview context the transcriber is, unlike the court reporter, not present at the time of the production of the original data, so any meaning conveyed by extralinguistic features is already lost before the transcription process even begins (unless described verbally by someone present, although that is still only a partial substitute, mediated through that participant’s perspective and personal interpretation). With regard to paralinguistic features, it is open to the transcriber to attempt to portray them in their transcript, but this is rarely seen. As Gibbons notes, the visual representation of such features within a written transcript tends to make the end result extremely difficult to read. He describes this as ‘a tension between two incompatible and competing criteria for transcription’, namely ‘readability’ and ‘accuracy’; and acknowledges ‘[t]he impossibility of simultaneously meeting these criteria in a single version’ (2003: 30). He observes that: In reality most of the transcripts produced in courtroom and police contexts, although they purport to be ‘verbatim’, are heavily weighted towards readability. The process of transforming speech into a readable form can involve radical change. (Gibbons, 2003: 31) A transcript on which a reporter has exercised this kind of editorial artistry—one in which grammar has also been corrected, false starts removed, and syntax rearranged—is undeniably more readable than its verbatim version. It is also a transcript in which reality has undeniably been transformed. (1990: 232)
During our discussion, it became clear that the ROTI clerks had received no training in any of the aspects of the transcription process raised here, nor is there any guidance available for them to follow. Instead, they had each developed their own practice. For example, when asked how they deal with pauses, the responses were as follows:
Example 2
no we don’t. no. we just ignore the pauses.
cos I’ve seen varyi- I’ve seen people, I’ve seen people write- put- put pauses in. and others don’t.
if he doesn’t reply I just put down defendant refuses to reply.
yeah.
if somebody’s saying something and then they pause for quite a long time, how would mark that?
no, we don’t do,
I just put like kinda like long dots so that- and then what he says. you know like a row of dots.
everybody does things differently!
you do get variations, and it’s down to individual flair isn’t it,
yeah!
(ROTIgroup_turns402–411).
This illustrates the wide variety of current transcription practice, even within the same small group of transcribers working closely alongside each other. What is of concern is that this appeared to be seen as not only acceptable, but perhaps even desirable, with similar references elsewhere to developing an individual style with experience. Yet the potential importance of pauses in interview interaction was illustrated shortly after by the police participant:
Example 3
when you come up against a no comment interview, I deliberately leave about 10 seconds. when
(ROTIgroup_turn489).
As this demonstrates, a great deal of interactional work can be done through silence, especially in the police interview context (see e.g. Heydon, 2011), and so its treatment in an official transcript matters. Its absence from a transcript of this interaction would create an entirely different impression to what was actually experienced by the participants. Yet there were more fundamental inconsistencies between the ROTI clerks in how they would record a ‘no comment’ interview such as this. R1 stated that her approach is: No comment? One liner! […] “The following questions were put to whoever, to which he replied, no comment to all.” And just list the questions. (ROTIgroup_turns222–224) Sometimes we do put like the no comment in if they do say no comment we do kinda (list) it depends on the way the erm- interview’s being run. […] kinda thing so if he’s just like, not really listening and he’s just listing off a load of questions, then we’ll do that, but if there’s erm a lot of evidence and things what are being produced, then I- I don’t know about anybody else but I put actual no comment in after the defendant. (ROTIgroup_turns246-249)
Example 4
[…] if they don’t respond I usually put, erm is it “no audible reply”
“no response” or something yeah
“no response” or “no audible reply”, cos obviously you don’t know, you just can’t hear them reply to it, […] so you are putting something you’re not just leaving it
[…]
that’s what
(ROTIgroup_turns257–269).
There are interesting discrepancies between the three choices of phrase mentioned here, although they are apparently seen as equivalent by the three transcribers. ‘No audible reply’, as R2’s explanation highlights, differs from ‘no response’ in that it allows for the possibility that something was said but simply couldn’t be heard on the recording. The specific mention of ‘audible’ also allows for there having been a visual indication of response, a feature which would only be available if an interviewer actively chose to describe it ‘for the tape’ (something which cannot be relied upon: see Haworth, 2013). This is therefore potentially a more even-handed way of representing the data. Equally, it could be argued that it adds ambiguity where none existed. Given the evidential significance of whether a question received a reply or not, thanks to s. 34 CJPOA, this may open up a line of argument to the defence which would not otherwise have been available.
Of more concern is R1’s interjection at the end of this exchange. Unlike the other two agent-less descriptors, ‘defendant remained silent’ positions the interviewee as performing an active process, thereby making this a conscious act. It can be no coincidence that it also matches the wording of the caution (‘You have the right to remain silent…’), thus invoking the relevant legislative provisions. This is therefore considerably less neutral than the other two options. It is, however, more balanced than the alternative version which R1 mentions in Example 2 above: ‘defendant refuses to reply’ goes considerably beyond recording a pause; it inserts an active process of refusal by the interviewee, demonstrating how loaded an apparently simple transcription choice can be. (The accompanying label of ‘defendant’ will be discussed further below.)
Several of the features discussed in this section can be observed in the following example, which is a comparison between part of the official ROTI of an interview with a man being questioned on suspicion of affray (s. 3 Public Order Act 1986) (Example 5A), and my own transcription of the relevant passage from the police audio recording (Example 5B).
Example 5A: Official ROTI
What were you going to do with the knife?
Sweet sod all actually.
So why carry the knife?
I don’t even know why I picked it up.
Example 5B: My transcription of the same data
what were you gonna do with the knife?
(2.7)
sweet sod
sod all? (2.0) so why
(1.3)
I don’t even know why I picked it up.
It can be seen that there are several key omissions from the ROTI. First, it does not record the long pause before answering (B2), which as already noted can have a significant effect interactionally. Combined with the lack of intonation indicators in the IE’s reply (cf. B3), the response as represented in A2 appears much more confrontational than when listening to the recording, 13 Further, B4 shows that the IR repeats the IE’s mild profanity, but this does not appear in A3, matching the observations of Walker and Bucholtz regarding unequal representation of participants in such interactions. Similarly, the IR’s ‘gonna’ (B1), spoken with a regional accent, is ‘corrected’ to ‘going to’ in the ROTI (A1). These points may seem insignificant on their own, but in combination they amount to a distinctly different representation of the interaction, even over this short extract, demonstrating the impact which even seemingly minor transcription choices can have on the data.
Problem 3: Editing
Alongside the smaller-scale changes described above, most interviews are subject to a much more substantial editing process. A typical interview record (ROTI) is in fact generally not much more than a summary, with only certain parts transcribed in full. A complete transcript of an entire interview is normally only prepared for the most serious cases. While this may be an understandable short-cut from a practical perspective (a full transcript will inevitably cost more in time and resources), it is another significant change to the original interview data, with the edited transcript now providing only a highly selective record of the interviewee’s words. It is worth emphasising that it is this edited version which will be presented to the CPS and used in deciding whether or not the matter should proceed, and indeed is the version generally presented to the court. Yet this editing process is performed almost entirely by transcribers with little to no training, yet who are entrusted to subjectively select which parts of the interaction will be recorded and reproduced. 14 Given that they are also routinely (civilian) police employees, often working closely alongside them, there is a plausible risk that their agenda and relevance criteria will be skewed in that direction.
An indication of the assumptions, and therefore potential biases, at work, can be gleaned from an analysis of references to interviewees during our focus group meeting. In the course of a 54-minute discussion, interviewees were referred to in the third person by the ROTI clerks a total of 53 times, either with nominal or pronominal reference. For example: ‘but what I do is if
ROTI clerks’ references to interviewees (nouns).
* Although note this can also have a pejorative sense.
ROTI clerks’ references to interviewees (pronouns).
Police participant’s references to interviewees (nouns).
* Again, a degree of interpretive caution is required; there was certainly an element of sarcasm in the descriptor ‘these customers of ours’ – ROTIgroup_turn517.
Police participant’s references to interviewees (pronouns).
This implies a worrying assumption on the part of the ROTI clerks that the person whose interview is being transcribed will at the very least be charged with an offence (‘defendant’), or actually convicted (‘offender’); in other words that the allegations being made at interview are substantiated. Of course, the interview should be part of the process of investigating whether that is in fact the case. Further, it reveals a default assumption that an interviewee is male. While this may be in line with the majority of their experience (see Ministry of Justice, 2016), it potentially makes a female interviewee marked in their perception, leading to possible bias when faced with transcribing an interview with a female (see e.g. Worrall, 1990 on the dangers this poses).
Any such biases are potentially a real danger given the scale and importance of the tasks currently entrusted to ROTI clerks. They are in fact routinely performing a highly significant quasi-legal function on evidence, without any legal training. When discussing their work, the ROTI clerks were at pains to emphasise their recognition of the requirement to make ROTIs ‘balanced and accurate’ at all times. Yet many aspects work against that aim. Concomitant with their employment status, professional identity and physical location, they are reliant on input from only one perspective when determining which parts of the original interaction will be included in the edited ROTI: the prosecution perspective.
When asked about the principles they apply to the editing process, the ROTI clerks emphasised the importance of including all ‘points to prove’ in the edited transcript. When asked how they knew what amounted to a ‘point to prove’, they produced a folder of A4 sheets which give a summary of many common criminal offences, such as theft and criminal damage. These begin with a box such as Figure 2 for theft. Although each element is expanded upon underneath, the text is written with a clear assumption of working legal knowledge.

Extract from the (it's this particular group of clerks, not clerks in general) ROTI clerks’ guidance.
First, it seems entirely unrealistic to expect those with no legal qualifications to understand and apply such information, thus creating a high risk of misapplication and error. For this to form the basis of which elements are included in an evidential document, to be presented in court as evidence against a defendant, and used as a point of comparison for the application of s. 34 CJPOA 1994, seems unacceptable. Yet at least some form of information about what is legally relevant is provided in this force; further investigation is necessary to establish practice more widely.
Second, not only were these documents not written for a lay audience, they are also clearly written for one particular perspective: prosecution. There are references throughout to charging decisions and considering alternative offences, indicating that they are taken from guidance for prosecutors. 15 For the ROTI clerks using these texts as their guidance, this leaves a serious gap in considering what the defence may consider to be relevant to include. For example, there are various situations where all necessary elements of an offence are made out (the ‘points to prove’), but a separate factor is present which amounts to a justification or excuse. Typical examples are self-defence, duress and mistake. If the editor of an interview recording has instructions as to what amounts to a ‘point to prove’, but not about what available defences might look like, there is a real risk that evidence supporting those defences will not be included in the ROTI.
The following example illustrates the editing process in action. It is taken from the same case as Example 5 above. The incident in question occurred in the interviewee’s own home, which police officers had attended in response to a phone call from another person. The interview lasted for 15 minutes, but when the audio file is edited down just to the parts transcribed in the ROTI, this amounts to only 7 minutes of talk. Over half of the interview is either omitted or summarised, as seen here:
Example 6A: Official ROTI
SUSPECT was informed that due to the nature of his ex-wife’s phone call to Police Officer’s [sic] and their subsequent concern for him they attended at his home address.
Example 6B: My transcription of the same data
°right. (0.7) °okay. (1.1) you’ve been arrested, (having) dropped the knife. right? (1.5) okay. is there anything you ask
(1.0)
why did they have to (call me) an- cause me all that
right. because of the nature of your wife’s call. or your ex-[wife]
[but they]
°right.
(1.1)
I weren’t causing
The IE’s turns here are omitted completely from the ROTI. At this point the IE puts forward his perspective on events, raising a question as to why the incident escalated as it did. Here we learn that apparently the officers had already checked on the IE’s welfare and actually left the property (B4), but then must have returned again (since they must have come back in order for him to have been arrested), for reasons which are not addressed and for which no satisfactory explanation is given. Yet the only part of this section which is recorded in the ROTI is the IR’s response to this question (A1 and B3), which does not amount to an adequate explanation for the officers’ return, as the IE points out in his response (B4 and 6). Further, the producer of this ROTI chooses to add in an element which does not occur in the interaction: the police’s ‘subsequent concern for him’ (A1). The inclusion of this embellishment, which conveniently portrays the police as acting due to compassion and in the IE’s best interests, is rather alarming. When combined with the fact that only the IR’s (enhanced) justification is represented in the ROTI while all counter-points raised by the IE are omitted, this is surely not a balanced representation of the interaction here. But, interestingly, the ROTI thereby also omits the IE’s own account of behaving rather abusively towards the officers (B4). Overall, this small example is a strong indicator of the power vested in those who are entrusted to produce ROTIs, especially given that, once available, ROTIs are generally the only source consulted by those involved in the case, and are the version presented in court as evidence.
Problem 4: Courtroom presentation
When it comes to the stage of presenting the interview to the court as part of the prosecution case, we have already noted that the ROTI is nearly always relied upon as sole evidence of what took place in the interview room. This is problematic enough in itself, given the various factors just discussed. But, rather than simply handing the court a copy of the transcript, the rather bizarre custom is to present the transcript orally. 16 In other words, the transcript is read out loud in court by a police witness acting as the interviewer, and—almost incredibly—the prosecutor generally takes the part of the defendant interviewee. In so doing, the participants are free to put whatever interpretative spin they wish on the material, for example adding emphasis, slowing pace, varying intonation, and so on. This can result in a radical transformation of the original meaning and intention of the speakers. Paralinguistic and extralinguistic features, removed during the transcription phase, are now put back into the data—yet they are not those used by the original speakers, but those of the prosecutor and the police witness (who may or may not be the original interviewing officer). Even with the best intentions, and speaking as someone who has performed this task as a prosecutor, it is almost impossible to avoid manipulating the data for one’s own agenda—which is the securing of a conviction.
Yet in the eyes of the court, the same words are used and so the message, and the interpretation, presumably must be the same. The bench and/or jury will be provided with copies of the transcript to follow during this presentation, to which they can refer later on. This is perhaps viewed as some form of safeguard, in that they are free to see the ‘actual words used’ and form their own opinion as to the correct intonation and intended meaning. However, any subsequent reading of the transcript is bound to be heavily influenced by the oral rendition they have just heard. (And in any case we have already seen that it is highly problematic to consider the official transcript as an accurate version of what was actually said.)
The process of converting the written data back into spoken form, then, involves just as much subjective interpretation, guesswork and plain inaccuracy as the reverse process discussed above. This can be seen in the following examples from R v Shipman, taken from the part of the trial where the interview was presented to the court as (prosecution) evidence.
Example 7: Official court transcript, Shipman Trial Day 23
“But there’s no mention in that entry which you claim to be for that date about taking a blood sample from her once again. I can see what you are pointing at. HP.”
Pause. I think the punctuation is a little adrift here, isn’t it? “But there’s no mention in that entry, which you claim to be for that date, about taking a blood sample, from her. Once again I can see what you are pointing at. HP, ESR. It doesn’t actually say you have taken a blood sample from her.” Sorry, I am being told something.
I am not sure that the punctuation you have inserted is necessarily correct.
No.
I think there is also a typing error too, because - - - - - - - - - - - - -
Is there? Yes.
There is. It has got ‘HP’ and it ought to be ‘HB’.
H……
B.
Yes.
“It’s not the custom of most general practitioners to write: ‘I have taken a blood sample which would consist of this, this and this.’ Most general practitioners just write down what the blood test is that they are doing.”
[*I have added these notes to aid clarity, they do not occur in the official court transcript.]
At the outset, it must be acknowledged that this extract comes from the official court transcript, which cloaks the data in an extra layer of interpretation of its own. The punctuation here is thus the court reporter’s. But the basic point is still clear. The police witness’s attempt to follow the official transcript of interview goes astray, either through the punctuation inserted by the interview transcriber, or through his own choice of intonation in reading it aloud. Prosecution Counsel recognises this and makes his own attempt at reading it out, but the judge interrupts, apparently because he has a different idea of how the data should be read. Note that the difficulty is, tellingly, referred to in terms of ‘punctuation’—a purely written language feature—instead of being described as a question of intonation or emphasis. There is no reference at all to how the words in question should sound, illustrating that all concerned are treating the data purely as a written document. The oral format, that is the original interview itself, is apparently long forgotten.
In addition, we see the (understandable) confusion of voiced and voiceless stops with ‘HP’ for ‘HB’, a medical abbreviation used by Shipman in his patient notes. This in itself may well have been of little consequence. But it still necessitated a correction by Defence Counsel, creating a further interruption. It is crucial not to lose sight of the fact that the point of this process is to present the interview to the court as evidence. Yet the actual exchange which took place in the interview room is completely overshadowed.
In fact, a potentially significant point does occur here, but is barely noticeable amid all the confusion: Shipman dodges the point being put to him. A common tactic used by Shipman in this interview is to appear co-operative but in fact to use a variety of avoidance tactics in response to the police questioning. (For a more detailed discussion, see Haworth, 2006.) Here he avoids addressing his own actions by referring instead to general medical practice. But, given the amount of interruption between the two turns from the original interview here, this subtle feature is all but lost, thanks to the difficulties created by the multiple changes in format.
The following is a further example. Once again we must note the caveat that this is the written version produced by the court reporter from the oral proceedings, and of course this is entirely different to the experience of those present at the time. Nonetheless, the confusion and loss of meaning is clear to see.
Example 8: Official court transcript, Shipman Trial Day 23
“We asked you earlier about the will and you say you have no knowledge of that. Correct?”
“That was correct.”
“But I think you said something else that wasn’t, well, wasn’t quite that answer, ‘I’ve no knowledge of it,’ so I’d like you to explain the ‘but’…”
Now can we just try that again because the meaning of it may have been lost. The “I’ve no knowledge of it but…” is a quotation. So can you just read it again, please?
“But I think you said something else that wasn’t, well, wasn’t quite that answer. You: ‘I’ve no knowledge of it but…’ I’d like you to explain the ‘but’.”
Please continue.
The problem here is twofold. First, Prosecution Counsel’s interjection suggests that the police witness has failed to use the appropriate intonation to indicate that part of his utterance was a quotation. (He, of course, had to guess at the ‘correct’ intonation by interpreting the punctuation added by the transcriber, which in turn was their interpretation of the original speaker’s intonation.) Second, the police witness has also omitted a vital word: ‘but’. This word, as originally used by Shipman, is in fact the whole focus of the interviewer’s turn. The combination of these reading errors results in the exchange making no sense, forcing the prosecutor to go back and seek corrections, thus interrupting the flow of the interview evidence (as also seen in Example 7). This leads to the absurd situation that in the middle of this exchange, we effectively have the prosecutor quoting the police witness quoting the police interviewer quoting Shipman. The jury could be forgiven for finding this whole exchange rather difficult to follow, even with a transcript in front of them. It is difficult to see how this can be described as an effective method of presenting the evidence.
By tracing the processes undergone by interview data from interview room to courtroom, then, we have identified serious problems with evidential preservation and consistency. Further, this has shown that at the most important stage of the criminal justice process—the court hearing—the most corrupted version of the evidence is utilised. This is clearly not a desirable correlation.
Other jurisdictions and contexts
Despite the problems outlined above, it must nevertheless be recognised that this is picking fault with one of the few legal systems in the world that routinely makes formal audio recordings of all police-suspect interviews. Most European (civil law) jurisdictions, for example, use only a written summary of the interview produced without the aid of a recording. This may be created during the interview process itself, or produced afterwards based on the interviewer’s notes. In many cases there is no attempt to reproduce the question-answer sequence, but instead the dialogue is converted into a monologic narrative in either the first or third person, authored of course by the creator of the document and not the interviewee. Not surprisingly, research has shown that this results in an even poorer representation of what interviewees actually said during interview: see Komter (2002, 2006), Van Charldorp (2011) on the Dutch process; Jönsson and Linell (1991) (Sweden); Eades (1995), Gibbons (1995) (Australia). More troublingly, in China, Mou’s ethnographic study documents deliberate and routine alteration of records of suspect interviews to align with the ‘official version of facts’ (2017: 78), including accounts of threats to interviewees’ families if they did not sign the falsified interrogation records (2017: 80–81).
Compared to practice elsewhere, then, it perhaps seems harsh to be critical of the current E&W treatment of interview data. Yet however progressive the E&W treatment of police-suspect interview data may be, unfortunately this has not migrated across to the treatment of witness interviews. These still involve the production of a monologic summary statement authored by the interviewer but written as if a first-person narrative by the witness 17 (see Rock, 2001 on the inadequacies of this process). The fact that witness interviews are not accorded the same treatment as suspect interviews probably stems from the original reasons for introducing the recording requirement in PACE, as discussed above: there is (generally) less need to consider protecting witnesses from the police than suspects. But this also indicates that, despite the fact that recordings and transcripts of suspect interviews have been the norm for some time now, the benefits of taking better care of interview evidence have yet to be recognised.
Discussion
It is clear that the current formats in which interview data are used are far from ideal. Further, the format changes which they undergo raise serious questions regarding evidential consistency. It is a long-established principle of police investigative practice that high levels of preservation must be applied to physical evidence, in order to avoid any contamination which may undermine its evidential merit. Yet the same system currently institutionally embeds contamination into the processing of interview data, without any apparent concern for the evidential consequences. This appears to stem from a lack of recognition that changes in the format of linguistic data involve transformation of the data themselves. A first step in improving current practice, then, is to increase awareness of that simple fact.
There is also scope for several specific improvements, all based on the principle of preserving the original data as intact as possible, and using them in as near as possible to their original form. These are as follows.
Practical implications and recommendations
All police interview recording equipment should be switched to digital rather than outdated audio cassette tapes, in order to ensure better data quality at source. Although this change was enabled by an amendment to PACE Code E in 2010,
18
it does not yet appear to be standard for the many police interviewing rooms across the E&W jurisdiction. Even with the current challenging financial climate for E&W police forces, developments in digital technology have made this a much more affordable and viable option than previously, and surely far outweighed by the potential cost (financial, reputational, and moral) of investigative errors, appeals, and miscarriages of justice. Further consideration, and research, should be directed towards the routine use of video recording. Again, technological developments make this increasingly more viable: it is already routinely used for all suspect interviews in New Zealand, for example.
19
Video would capture considerably more of the original context, although it is by no means a complete fix (for useful discussion of the key issues, see Gibbons, 2003: 34–35 and Brown, 1997: 154–155). It is also worth noting that several of the ROTI clerks I spoke with expressed a preference not to watch video footage while transcribing, even when it is available. It is therefore recommended that further research is undertaken into the use of video recording of interviews, taking into account the views of all key stakeholders. A standard code of practice for ROTI transcription should be introduced. This should include a set of standard transcription conventions, to cover features such as overlaps, pauses, and any areas of uncertainty. This would ensure consistency in production and interpretation, which would be especially beneficial at the courtroom evidence stage. In order to be fully effective, this must be implemented at jurisdictional level, rather than regional force level. Further research should be undertaken to establish which features should be included, in order to balance the competing demands of information preservation and readability. Better training should be given to transcribers. Standard training for transcribers should be given on appointment. This should include some introduction to language and spoken communication, and the differences between spoken and written language. It should also address the editing process, giving an indication of the principles to be applied when deciding what should be included in full, or in summary form, or left out altogether. (Note that it is not considered practical to recommend that transcripts are not edited, however desirable this may be in theory: financial and time constraints mean that this is always likely to be necessary to some extent.) There should also be a system for providing ongoing training to keep transcribers up to date with relevant legal changes, and, crucially, how these affect their practice. For example, the ROTI clerks I met had received updates on legal principles surrounding factors such as mentioning prior convictions or other aspects of character, but differed in their interpretation of how that fed into their practice, with some omitting any such detail while others left it in. This demonstrates how partial or limited knowledge can be more dangerous than no knowledge at all. However, it is suggested that serious thought be given as to whether it is in any case appropriate to let the onus for so many important evidential functions fall on largely untrained and unqualified ROTI clerks. This is by no means intended as criticism of those clerks, but more a call to recognise and re-evaluate their significant role in the evidential chain. Those subsequently assessing the interview as evidence should listen to the original recording rather than relying on the official transcript. It is hoped that this article has demonstrated the benefit of using a data source which is as close as possible to the original whenever possible. The use of a transcript rather than a recording at trial seems generally to be merely an administrative shorthand, which the ready availability of digital technology is making even less necessary. It is always open to the defence to request that the original recording be played, rather than relying purely on the written transcript, and I would suggest that this is an option which should be relied upon rather more than is currently the case.
20
Further, if there is ever any doubt about what a suspect actually said at interview, then this doubt should be resolved in favour of the suspect interviewee. The practice of reading aloud the interview transcript in court should be abandoned. This additional format change is, I would argue, of no benefit but potentially considerable detriment. The examples shown above of this process in action demonstrate that it adds a further unnecessary layer of distortion, confusion and corruption to the interview data. Further, given that this task is performed by the prosecuting lawyer and a police witness, any shift in emphasis or interpretation—intentional or otherwise—is most likely to be in a direction which favours the prosecution case. But of course any distortion of evidence is in the interests of neither side. If this change is not considered practicable, then at the very least the defence should be used to perform the turns of the interviewee, to provide more balance than the current arrangement.
Conclusion
To conclude, in this article we have observed the various transformations that interview data undergo from the initial interview to their production as evidence in a courtroom, and revealed serious flaws in the current process. While such transformations could, at an individual level, be perceived as unlikely to cause any real interference in the course of justice, when we are dealing with evidence, and factors which have the potential to influence the opinion of a jury towards a defendant, there is no room for complacency. It also cannot be overlooked that power over the data, and hence over a suspect’s own words, is held almost exclusively by the prosecution side, while the potential detriment is mainly to the defence. There is a generally accepted principle that all evidence should be preserved as intact as possible, and treated with fairness to both sides. It is time to acknowledge that this principle currently does not extend to interview evidence.
Current moves towards digital recording present an ideal opportunity to revisit practice, and for professionals to work alongside researchers to develop an approach to handling linguistic evidence which is itself evidence-based. Ultimately, any increase in understanding of the linguistic factors which affect police interview data can only enhance the system and reduce the risk of error, misinterpretation and injustice.
Footnotes
Author’s note
The initial research on which this article is based was funded by the Economic and Social Research Council (ESRC).
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Notes
Key to transcription
(n.b. Court transcripts and ROTIs were not produced by the author and do not follow these conventions.)
