Abstract
A reliable lie-detection method would be extremely useful in many situations but especially in forensic contexts. This review describes and evaluates the range of methods that have been studied. Humans are barely able to pick up lies on the basis of nonverbal cues; they do so more successfully with systematic methodologies that analyze verbal cues and with physiological and neuroscientific methods. However, the rates at which people are able to detect lies are still well below the legal standard of “beyond a reasonable doubt.” This means that the utmost caution must be exercised when such methods are employed. In investigations where independent evidence exists, there is emerging evidence that interviews based on a free account followed by the gradual introduction of the evidence by investigators can reveal inconsistencies in a guilty interviewee’s account. Automated machine-learning methods also hold some promise.
The premise of a world in which people do not lie is explored in various films: In The Invention of Lying, Ricky Gervais lives in a society where he is the only one who can lie, and in Liar Liar, Jim Carrey is unable to lie for a day. Both films effectively show through reductio ad absurdum how lies and half-truths are inextricably built into normal social interaction. We all lie, many of us every day, a few of us several times a day (Serota et al., 2021), presumably with a low ecological detection rate.
In recent decades, psychological science has intensively researched the question of how best to tell who is lying. Reliable ways of distinguishing a false from a true account would be extremely useful in many practical contexts, including hiring processes, interviews of asylum seekers, and the legal system from the start of an investigation all the way to court and beyond. In the absence of such reliable indicators, methods with overblown claims and no scientific backing have proliferated (Denault et al., 2020), and the withering of this growing market would be a serendipitous side effect of producing a truly functional lie-detection method.
What characteristics must a practical lie detector have? It would correctly categorize lies as lies and truth as true. How successfully should it ideally manage this dual task? The deleterious consequences of falsely categorizing a truth-teller as a liar vary from domain to domain; in the legal sphere, they can lead to miscarriages of justice, so the answer in the forensic context is “beyond a reasonable doubt,” defined as 90% to 95% certainty (Magnussen et al., 2014). More generally, there needs to be an awareness that the output of a lie-detection method that is incorrect 30% of the time can inform a working hypothesis, rather than be interpreted as strong independent evidence one way or the other. This article reviews the status and promise of the main approaches, both those that rely on our intuitive skills and may be useful in everyday contexts and those using technological methods aimed at detecting lies and deceit in professional investigative contexts.
Nonverbal Cues to Deception
Historically and cross-culturally, people have believed that liars can be identified through their nonverbal behavior, with cues such as avoidance of eye contact and touching or scratching themselves (The Global Deception Research Team, 2006). Many creatively designed studies have tested whether participants can successfully determine whether people are lying or telling the truth. In addition to studies in which convenience samples were asked to judge videos of people who have been instructed to lie, there are also experiments with judges, psychiatrists, and detectives as participants as well as other research showing perpetrators of serious crimes lying. The conclusions of an influential meta-analysis by DePaulo et al. (2003) are still valid: The nonverbal cues to deception, although believed in by laypeople, are faint (Vrij et al., 2019). In meta-analytical simulations incorporating publication bias, Luke (2019) showed that the literature is in fact consistent with there being no human ability to detect nonverbal cues to deception, and performance is similarly low when methods have been used to attempt to detect lies in children (Gongola et al., 2017). Thus, the data show that the widespread beliefs are incorrect, and the field of lie detection via nonverbal cues is an excellent example of the power of science to debunk a myth (Brennen & Magnussen, 2020).
Verbal Cues to Deception
The clear message of the DePaulo et al. (2003) article refocused researchers’ effort on lie detection via verbal methods. Systematic instructions have been developed to analyze written and oral witness statements, on the basis of theories of lying and communication. Several of these have reported hit rates in the range of 70%, substantially better than reliance on nonverbal cues. The most widely used method, and among those with the strongest empirical support, is criterion-based content analysis (CBCA), in which statements are scored on 19 criteria such as “logical structure of the account” and “unexpected complications in the account” (Oberlader et al., 2016, 2021). There are challenges regarding the reliability of coding of the criteria, but even more pertinent are the limitations pointed out by Oberlader et al.: If one adjusts the decision criterion to minimize the incidence of calling a true statement a lie, the rate of detection of deceitful statements is reduced to 9%, with a similar collapse in the detection of true statements if one prioritizes catching deceitful statements. These caveats are not currently salient in the plentiful citations of these meta-analyses. CBCA and several of the other verbal-cue methods do distinguish between truthful and deceitful statements to a statistically significant degree. This is nevertheless a long way from showing that they are directly applicable in the forensic context.
In an innovative study, Van Der Zee et al.’s (2022) starting point was The Washington Post’s fact-checking of Donald Trump’s tweets from 3-month periods while he was president. The tweets were then entered into an extremely detailed text-analysis program to see whether one could reliably classify the tweets as factually correct or incorrect on the basis of verbal cues. Although this personalized approach had a higher hit rate for Trump’s tweets than more general theoretical and data-driven models, it is still noteworthy that overall accuracy was under 80%: Even a thorough quantitative analysis of the verbal cues contained in hundreds of statements from a single person does not reach the “beyond a reasonable doubt” standard.
We now turn our attention from methods aimed at detecting lies in relatively naturalistic contexts to those more specifically used by law enforcement. This reflects the development in the field’s research strategy: Whereas previously there was a search for general-purpose cues to deception, there is now a narrower focus on the forensic context.
Methods That Manipulate Statement Production
Accepting the unlikelihood of nonverbal cues ever being forensically useful, Vrij and Granhag (2012) proposed an approach to lie detection in investigative interviewing based on well-documented cognitive differences between telling a lie and telling the truth. For instance, lying is mentally more demanding and requires planning, and it has been shown that interview techniques that impose cognitive load and pose unexpected questions indeed allow observers to categorize lies and truths more effectively. Similarly, liars have a tendency to keep their accounts brief. By showing participants a very detailed model statement that their own statement should try to replicate, truth-tellers produce more unexpected complications and peripheral details, whereas liars keep their stories straighter (Vrij et al., 2018). Such methods are still in development, and recent meta-analyses have concluded that the current literature shows evidence of publication bias and that the methods are not yet ready for transfer to the applied arena (Levine et al., 2018; Mac Giolla & Luke, 2020).
The Polygraph and Neuroscientific Methods
Several methods of lie detection rely on the bodily changes that arise when telling a lie. The logic is that only guilty suspects would show increased physiological arousal when asked about aspects of the crime in question. The central challenge for physiological methods has been to determine a baseline against which to compare the person’s reaction to the key crime-related questions. The polygraph is a machine that simultaneously measures several indices of a person’s physiological arousal, such as heart rate, blood pressure, respiration, perspiration, and skin conductivity. It has long been controversial, and the essence of the debate remains the same today as it was in an authoritative report from the U.S. National Research Council (2003; Iacono & Ben-Shakhar, 2019). The most widely used method is the comparison-question technique (CQT), which analyzes the pattern of psychophysiological reactions to three types of questions: Those to which both guilty and innocent persons are expected to tell the truth (e.g., “What is your name?”), those to which both groups are expected to lie or at least struggle to answer with a definitive “no” (e.g., “Have you ever taken anything that didn’t belong to you?”), and those to which a guilty person is expected to lie and an innocent one to answer truthfully (e.g., “Did you steal the necklace?”); all questions are agreed on before the test, so the surprise factor is eliminated. Reviews show that the CQT in conjunction with the polygraph distinguishes between lies and truth with a hit rate of around 70% (Iacono & Ben-Shakhar, 2019).
Another polygraph technique is the concealed-information test, also known as the guilty-knowledge test. It relies on the following logic: If it is not known to the public that the weapon used in a murder was a piece of rope, then only the perpetrator should show raised psychophysiological activation when saying “no” to “Was rope used as a weapon in this crime?” when compared with saying “no” to questions with alternative weapons. Thus, whereas the CQT is attempting to pick up signs of deception directly, the guilty-knowledge test is looking for signs of memory, a subtly different task that may allow an inference of deception. In a meta-analysis, Meijer et al. (2014) found that in lab-based studies, the polygraph discriminates between “guilty” and control questions (with a larger effect size than the CQT discriminates between truth and lies). However, there are few studies in real-life settings, and the caveats outlined above about conclusions in a specific case still apply.
The development of modern brain-imaging methods, such as functional MRI (fMRI), that can register a person’s patterns of brain activity as they perform cognitive tasks, has opened the possibility that lies may be detected by looking inside the brain rather than at the peripheral psychophysiological responses recorded by the polygraph. Many studies have investigated this topic using subtle experimental manipulations with participants in an fMRI scanner, and they report differences between patterns of brain activation when people lie and when they tell the truth. However, a forest of obstacles sits between such results and the technology’s potential application in forensic settings (Jones & Wagner, 2020). In experiments, there are typically many participants who make many responses, and the findings are averages over many trials, whereas the question in practical contexts is often “Is this person lying about this issue now?” The results from brain-imaging studies are not robust enough to bridge that gap. The extent to which the lies that participants in such studies are instructed to tell are neurologically and morally equivalent to lies in everyday life is another issue that may be fatal to this approach (Sai et al., 2021). There is also a problem of the necessity of compliance because the method depends on a suspect’s willingness to perform esoteric tasks repeatedly while lying inside a noisy, claustrophobic tube. In addition to these in-principle difficulties, the fMRI literature tends to show that, at a broad level of analysis, the processing of distinct concepts may share the same neural substrate, including pairs of terms that are analogous to the truth–deceit distinction, for instance, differences between perceiving and imagining visual scenes (Dijkstra et al., 2019). This suggests that even if the theoretical objections can be overcome, the empirical reality may turn out not to be conducive to promoting neural lie detection. As with nonverbal methods, there are substantial economic rewards on offer to a successful body-based lie detector, so any claims of success need to be thoroughly scrutinized.
Meijer and Verschuere (2017) argue that body-based techniques face challenges that are not bound to the accuracy of the technology so much as to the chain of logical reasoning when determining lies. Certain topics or questions may induce the interviewee to lie, which will provoke physiological reactions that the machine can pick up. However, such activation is not uniquely triggered by deception, so one cannot conclude that a lie is being told from bodily reactions. Note that the guilty-knowledge test is not affected by this logic because it is not aimed at detecting deception.
Free Recall Followed by Confrontation With Evidence: The Basis of a Common-Sense Approach
It has long been known than human autobiographical memory is often inaccurate; in addition to normal forgetting, a number of factors induce systematic memory errors and even false memories (Schacter, 2022). In instances where it can be shown that it is likely that these caveats do not apply, we can say that a person is lying when they say something that is demonstrably at odds with objective reality, which in the forensic context might be independent evidence. This truism from everyday life is the basis of the methods with humans as observers that have most potential and most empirical support. The related methods of strategic use of evidence and tactical use of evidence have been developed as police-interview techniques. The interviewee or suspect is first asked to provide an account of the incident with open and follow-up questions to obtain as thorough an account as possible. For example, an interviewee might initially claim that they did not leave their home on the day in question. Gradually, the authorities can then introduce the independent evidence they possess into the questioning, possibly revealing inconsistencies with what the interviewee has said (e.g., first mentioning that there is evidence that contradicts the person’s claim before showing closed-circuit TV footage of the person at a shopping center) and asking how the interviewee can account for this. A meta-analysis shows that strategic use of evidence induces more such statement–evidence inconsistencies (potential lies) in liars than in truth-tellers and that these techniques are more effective with late and gradual release of the evidence (Oleszkiewicz & Watson, 2021). In cases where investigators do not have other evidence regarding the crime (e.g., when it is purely a case of one person’s word against another’s), these methods cannot be applied. Such methods reflect the evolution of the field from a sort of hunt for psychological X-ray specs to the integration of lie detection into methods for the reliable elicitation of information.
Machine Learning
More recently, machine learning has been applied to lie detection. There are results that demonstrate superior performance compared with humans on the same material (Kleinberg & Verschuere, 2021), and Tomas et al. (2022) mapped out how human and machine approaches can be combined for the detection of deceit in written accounts. Given the history of technological progress, it seems likely that machine-based lie detection will, for some purposes, become practically viable. We note, for example, that the fact that humans are unable to pick up nonverbal cues to deception does not rule out the possibility that machine-learning algorithms will be able to do so.
A study by Krishnamurthy et al. (2023) illustrates both the potential of the approach and the current limitations. Automated extraction of audio, textual, and visual details was performed on a set of 121 videos from courtrooms in which the truth status of the witnesses was known. Then these data were used as the input to a neural net, which correctly categorized the statements at a rate of 96%. Factors that temper the interpretation of this promising result include the low number of clips and the phenomenon of overfitting of a model to the training data.
Potential Domains of Application
It is necessary to emphasize that the approaches reviewed here have different possible domains of application, some of which are quite restricted, as summed up in Table 1. Nonverbal methods can be used in all contexts in which a communicator’s face or body is visible, including daily life, airport security, and forensic settings. Verbal methods are mainly based on written accounts (e.g., of witness statements), which makes them well suited to investigations when one has time to apply a systematic procedure. Body-based methods until recently have been more invasive and require a dedicated test session with a compliant interviewee. Methods based on an initial free account depend on the interviewer being in possession of independent information, whereas delineating how automated methods can be applied is like trying to shoot a moving target in the dark because the field is in a stage of such innovative development.
The Domains of Application and an Assessment of Lie-Detection Methods
Note: CBCA = criterion-based content analysis.
As pointed out, in the last decade, the field of lie detection has focused on the forensic context. What seems likely is that the machine-learning approach will reverse this trend and develop lie-detection methods for a wider variety of domains (e.g., insurance fraud and online reviews, which it is worth noting are domains where the “beyond a reasonable doubt” criterion do not necessarily operate). Our review of the empirical database nevertheless shows that few methods are currently ready for application in their domains.
William Blackstone’s credo of 1787 that it is “better that 10 guilty persons escape than that one innocent suffer” (Blackstone, 1787, p. 352) has been a guiding principle of Western legal systems. On the other hand, when law enforcement is trying to detect malignant intent in crowds or queues, which following September 11, 2001, has been a priority in the United States, the equation is reversed because the consequences of missing a potential terrorist can be so enormous. Nevertheless, the evidence for the ability of such programs to detect people with evil intentions is weak (Denault et al., 2020; Meijer et al., 2017). This is in large part explicable by their reliance on nonverbal cues to deceit that, as we saw above, are faint to nonexistent.
Conclusion
Inasmuch as the best current methods are based on common sense, the field of lie detection can be considered a decades-long detour full of the hope of finding relatively simple cues that indicate lying, or, more positively, can be seen as a classic example of how a concerted research effort in experimental psychology has debunked a widely held intuition that has been expressed for millennia: The science shows that there are no reliable behavioral signs of deceit that human are able to detect. The field is in its very nature applied and yet is also characterized by an awkward distance between the research and actual practical utility. There is evidence that some structured methods do indeed pick up some signal of deceit but with large error rates, meaning that great care must be taken in practical contexts not to overinterpret results, especially as such methods will typically be employed when there is an absence of alternative strong evidence. A false positive can change the course of an investigation, with the mechanisms of confirmation bias quickly leading people to overlook the fact that the initial evidence that triggered this new direction was not solid, and furthermore cause subsequent circumstantial evidence to be interpreted in line with the possibly faulty conclusion about a suspect’s deceit.
Surprising as it may seem, and despite a hundred years research on the topic (Denault et al., 2022), currently “the best general advice from the psychological literature on verbal lie detection remains simply that a person is lying if what they say is inconsistent either with other things that they have said or with other evidence” (Brennen & Magnussen, 2022, p. 8). Perhaps one factor that makes the task of detecting lies so challenging is that psychological phenomena are intrinsically noisy (Kahneman et al., 2021). Researchers looking for reliable signs of deceit face the challenge of coping with large individual and cultural differences in behavior and an immense number of situational factors that affect any subtle behavior—such as lying. Maybe, then, there is little prospect of ever developing techniques that detect lies and deceit with a probability that approaches the “beyond a reasonable doubt” criterion. Two possible ways out from this cautious view are methods that do not passively detect lies but induce them (or statement-evidence inconsistencies) in guilty suspects and the probability that artificial-intelligence approaches will eventually provide reliable detection methods.
Recommended Reading
Iacono, W. G., & Ben-Shakhar, G. (2019). (See References). A careful, updated review showing that the polygraph is still not able to distinguish between truth and lies at forensically relevant rates.
Luke, T. J. (2019). (See References). A new look at the literature on detection of deceit showing that when one takes publication bias and study weakness into account, the limited human ability to tell who is lying may be even worse than previously concluded.
Oberlader, V. A., Quinten, L., Banse, R., Volbert, R., Schmidt, A. F., & Schönbrodt, F. D. (2021). (See References). A sophisticated meta-analysis of two of the most widely used methods of detecting lies from verbal cues using a range of bias corrections, concluding that they may distinguish between lies and truth at a rate of approximately 70% and suggesting how research practices might be revised to produce more useful methods.
Vrij, A., Hartwig, M., & Granhag, P. A. (2019). (See References). An authoritative review of the nonverbal detection of lies.
Wagner, A., Bonnie, R. J., Casey, B. J., Davis, A., Faigman, D. L., Hoffman, M. B., Jones, O. D., Montague, R., Morse, S. J., Raichle, M. E., Richeson, J. A., Scott, E. S., Steinberg, L., Taylor-Thompson, K., & Yaffe, G. (2016). (See References). A clear exposition of the many challenges to be resolved before one might reliably be able to detect lies by functional MRI.
