Sage Journals: Discover world-class research

Abstract

ChatGPT's large language model, GPT-4V, has been trained on vast numbers of image-text pairs and is therefore capable of processing visual input. This model operates very differently from current state-of-the-art neural networks designed specifically for face perception and so I chose to investigate whether ChatGPT could also be applied to this domain. With this aim, I focussed on the task of face matching, that is, deciding whether two photographs showed the same person or not. Across six different tests, ChatGPT demonstrated performance that was comparable with human accuracies despite being a domain-general ‘virtual assistant’ rather than a specialised tool for face processing. This perhaps surprising result identifies a new avenue for exploration in this field, while further research should explore the boundaries of ChatGPT's ability, along with how its errors may relate to those made by humans.

Keywords

ChatGPT face matching large language model artificial intelligence face perception

In November 2022, OpenAI's ChatGPT was released to the public and we were in awe of its ability to tell jokes and do our assignments for us. However, with its fabricated citations (Walters & Wilder, 2023) and overuse of words like ‘delves’ and ‘significant’ (Kobak et al., 2024), we soon realised that our fear of artificial intelligence and the impending ‘rise of the machines’ may have been a tad premature. Here, with the latest advances as of September 2023, I argue that perhaps a little fear may still be warranted.

ChatGPT's newest large language model (LLM) – GPT-4V(ision) – can process visual input. Preceding models relied on extensive training with text data sourced from the internet, allowing them to learn the structure within these data. Now, through exposure to vast numbers of image-text pairs, the latest model can ‘see’, recognise and interpret uploaded images. Crucially, if any future Terminator is going to be successful in its mission, it will need to discriminate between people to target key members of the resistance. Therefore, I wanted to determine how well ChatGPT performed with face images in particular, building on previous research showing human levels of accuracy when identifying mental states from only the eye region of faces (Elyoseph et al., 2024).

Face matching involves deciding whether two photos depict the same person or two different people. This can be challenging when the faces are unfamiliar since the photos themselves lack information regarding how appearance can vary as a function of lighting, expression, hairstyle and so on (Hancock et al., 2000). Researchers have created several such tasks, varying in both trial difficulty and how unconstrained or controlled the images are, and so I tested ChatGPT on a selection of these (see Figure 1), with the results presented in Table 1.¹

Figure 1.
An example from the Glasgow Face Matching Test, showing ChatGPT's correct response.

Table 1.
Human and ChatGPT performance on several tests of face matching.

Test Version Trials Human accuracy (%) ChatGPT accuracy (%)

Glasgow Face Matching Test (Burton et al., 2010) Short 40 81.3 (9.7) 92.5

Glasgow Face Matching Test 2 (White et al., 2022) Short 80 75.0 (10.0) 76.3

Kent Face Matching Test (Fysh & Bindemann, 2018) Short 40 65.9 (9.6) 62.5

Models Face Matching Test (Dowsett & Burton, 2015) Long 120 64.5 (7.5)^a 88.3

Oxford Face Matching Test (Stantic et al., 2022) Long 200 74.1 (5.4) 77.0

Expertise in Facial Comparison Test (White et al., 2015) Short 84 77.9 (8.2)^b 81.0

Note. Human accuracies reported as M (SD).

^aValues from typical perceivers featured in Bobak et al. (2016).

^bValues from Stacchi et al. (2020).

I found that ChatGPT's performance was comparable with human abilities, making it less accurate than state-of-the-art deep convolutional neural networks (DCNNs; National Institute of Standards and Technology, 2024; Phillips et al., 2018). However, the difference is that ChatGPT is based on a transformer neural network, first converting images into text-like representations. Crucially, ChatGPT can interpret any image rather than being limited to a specific domain (unlike DCNNs) and is freely available for use by Terminators (and the general public).

Having shown that ChatGPT demonstrates competent face matching abilities, further study might investigate the underlying process. For instance, are some facial features (e.g., the eyes) more heavily weighted during comparisons? Are some transformations (e.g., head rotation, expression change) more difficult to handle? How do ChatGPT's errors relate to human decisions? The possibilities for combining LLMs and face perception research are numerous and represent an exciting new avenue for exploration. Indeed, by probing the nature of its internal representations, we may uncover both similarities and differences with human face processing, resulting in new insights much like those emerging from research with DCNNs (e.g., Parde et al., 2019). In the meantime, while August 29, 1997 (aka ‘Judgement Day’) passed without incident, I recommend caution as we continue to witness the evolution of ChatGPT in the near future.

Test	Version	Trials	Human accuracy (%)	ChatGPT accuracy (%)
Glasgow Face Matching Test (Burton et al., 2010)	Short	40	81.3 (9.7)	92.5
Glasgow Face Matching Test 2 (White et al., 2022)	Short	80	75.0 (10.0)	76.3
Kent Face Matching Test (Fysh & Bindemann, 2018)	Short	40	65.9 (9.6)	62.5
Models Face Matching Test (Dowsett & Burton, 2015)	Long	120	64.5 (7.5)^a	88.3
Oxford Face Matching Test (Stantic et al., 2022)	Long	200	74.1 (5.4)	77.0
Expertise in Facial Comparison Test (White et al., 2015)	Short	84	77.9 (8.2)^b	81.0

Footnotes

Author Contribution(s)

Robin Kramer: Conceptualization; Investigation; Methodology; Project administration; Resources; Visualization; Writing – original draft; Writing – review & editing.

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Robin Kramer

Notes

References

Bobak

A. K.

Dowsett

A. J.

Bate

(2016). Solving the border control problem: Evidence of enhanced face matching in individuals with extraordinary face recognition skills. Plos ONE, 11, e0148148. https://doi.org/10.1371/journal.pone.0148148

Burton

A. M.

White

McNeill

(2010). The Glasgow face matching test. Behavior Research Methods, 42, 286–291. https://doi.org/10.3758/BRM.42.1.286

Dowsett

A. J.

Burton

A. M.

(2015). Unfamiliar face matching: Pairs out-perform individuals and provide a route to training. British Journal of Psychology, 106, 433–445. https://doi.org/10.1111/bjop.12103

Elyoseph

Refoua

Asraf

Lvovsky

Shimoni

Hadar-Shoval

(2024). Capacity of generative AI to interpret human emotions from visual and textual data: Pilot evaluation study. JMIR Mental Health, 11, e54369. https://doi.org/10.2196/54369

Fysh

M. C.

Bindemann

(2018). The Kent face matching test. British Journal of Psychology, 109, 219–231. https://doi.org/10.1111/bjop.12260

Hancock

P. J. B.

Bruce

Burton

A. M.

(2000). Recognition of unfamiliar faces. Trends in Cognitive Sciences, 4, 330–337. https://doi.org/10.1016/S1364-6613(00)01519-9

Kobak

González-Márquez

Horvát

E-Á

Lause

(2024). Delving into ChatGPT usage in academic writing through excess vocabulary. arXiv. https://doi.org/10.48550/arXiv.2406.07016

National Institute of Standards and Technology (2024, September). Face Recognition Technology Evaluation (FRTE) 1:1 Verification. U.S. Department of Commerce. https://pages.nist.gov/frvt/html/frvt11.html

Parde

C. J.

Castillo

Sankaranarayanan

O’Toole

A. J.

(2019). Social trait information in deep convolutional neural networks trained for face identification. Cognitive Science, 43, e12729. https://doi.org/10.1111/cogs.12729

10.

Phillips

P. J.

Yates

A. N.

Hahn

C. A.

Noyes

Jackson

Cavazos

J. G.

Jeckeln

Ranjan

Sankaranarayanan

Chen

J.-C.

Castillo

C. D.

Chellappa

White

O’Toole

A. J.

(2018). Face recognition accuracy of forensic examiners, superrecognizers, and face recognition algorithms. Proceedings of the National Academy of Sciences, 115, 6171–6176. https://doi.org/10.1073/pnas.1721355115

11.

Stacchi

Huguenin-Elie

Caldara

Ramon

(2020). Normative data for two challenging tests of face matching under ecological conditions. Cognitive Research: Principles and Implications, 5, 8. https://doi.org/10.1186/s41235-019-0205-0

12.

Stantic

Brewer

Duchaine

Banissy

M. J.

Bate

Susilo

Catmur

Bird

(2022). The Oxford face matching test: A non-biased test of the full range of individual differences in face perception. Behavior Research Methods, 54, 158–173. https://doi.org/10.3758/s13428-021-01609-2

13.

Walters

W. H.

Wilder

E. I.

(2023). Fabrication and errors in the bibliographic citations generated by ChatGPT. Scientific Reports, 13, 14045. https://doi.org/10.1038/s41598-023-41032-5

14.

White

Guilbert

Varela

V. P. L.

Jenkins

Burton

A. M.

(2022). GFMT2: A psychometric measure of face matching ability. Behavior Research Methods, 54, 252–260. https://doi.org/10.3758/s13428-021-01638-x

15.

White

Phillips

P. J.

Hahn

C. A.

Hill

O’Toole

A. J.

(2015). Perceptual expertise in forensic facial image comparison. Proceedings of the Royal Society B: Biological Sciences, 282, 151292. https://doi.org/10.1098/rspb.2015.1292

Face to face: Comparing ChatGPT with human performance on face matching

Abstract

Keywords

Footnotes

Author Contribution(s)

Declaration of Conflicting Interests

Funding

ORCID iD

Notes

References