Sage Journals: Discover world-class research

Abstract

Negotiation is a live, back-and-forth process—exactly the kind of human interaction today’s static AI benchmarks miss. We created interactive agent environments based on two classic game-theory paradigms—the one-shot Ultimatum Game and the open-ended Nash Bargaining task—to watch large language models (LLMs) reason, cooperate, and compete as the deal keeps changing. Using the Harvard Negotiation Project’s six principles (Interests, Legitimacy, Relationship, Options, Commitment, Communication) we scored a variety of large language models across hundreds of rounds. Llama-3 generally struck the most effective bargains; Claude-3 leaned aggressive—maximizing its own gain but risking push-back—while GPT-4 offered the fairest splits. The results spotlight both promise and pitfalls: today’s top LLMs can already secure mutually beneficial deals, yet still falter on consistency, legitimacy, and commitment when stakes rise. Our open-source benchmark invites human-factors researchers to probe these behaviors, design safer negotiation workflows, and study how mixed human-AI teams might unlock even better outcomes.

Keywords

large language models game theory artificial intelligence

Get full access to this article

View all access options for this article.

References

Aher

G. V.

Arriaga

R. I.

Kalai

A. T.

(2023). Using large language models to simulate multiple humans and replicate human subject studies. Proceedings of the 40th International Conference on Machine Learning (ICML). PMLR.

Akata

Schulz

Coda-Forno

Bakker

M. A.

Cettolo

Clough

Fernández

Gadiraju

Gurevych

Herel

Hupkes

Koepke

Lapesa

Lawrence

Madureira

Malmi

Martina

Maučec

M. S.

. . . de Rijke

(2023). Playing repeated games with large language models. arXiv preprint arXiv:2305.16867.

Argyle

L. P.

Busby

E. C.

Fulco

D. J.

Gubler

J. R.

Rytting

Wingate

(2023). Out of one, many: Using language models to simulate human samples. Political Analysis, 31(3), 337–351. https://doi.org/10.1017/pan.2023.2

Bakhtin

Brown

Dinan

Farina

Flaherty

Fried

Goff

Gray

Jacob

A. P.

Komeili

Konath

Kwon

Lerer

Lewis

Miller

A. H.

Mitts

Renduchintala

Roller

. . . Zijlstra

(2022). Human-level play in the game of Diplomacy by combining language models with strategic reasoning. Science, 378(6622), 1079–1086. https://doi.org/10.1126/science.ade9097

Bianchi

Chia

P. J.

Yüksekgönül

Tagliabue

Jurafsky

Zou

(2024). How well can LLMs negotiate? NegotiationArena platform and analysis. In Proceedings of the 41st international conference on machine learning (ICML) (Vol. 235, pp. 3935–3951). PMLR.

Chen

Liu

T. X.

Shan

Zhong

(2023). The emergence of economic rationality of GPT. Proceedings of the National Academy of Sciences, 120(51), e2316205120. https://doi.org/10.1073/pnas.2316205120

Daly

S. J.

Hearn

Papageorgiou

(2025). Sensemaking with AI: How trust influences human-AI collaboration in health and creative industries. Social Sciences & Humanities Open, 11, 100346. https://doi.org/10.1016/j.ssaho.2025.100346

Fan

Chen

Jin

(2024). Can large language models serve as rational players in game theory? A systematic analysis. Proceedings of the AAAI Conference on Artificial Intelligence, 38(10), 17960–17967. https://doi.org/10.1609/aaai.v38i10.29854

Falcão

(2024). Making sense of negotiation and AI: The blossoming of a new collaboration. Journal of Strategic Contracting and Negotiation, 8(1–2), 44–64. https://doi.org/10.1177/20555636241227031

10.

Fehr

Schmidt

K. M.

(1999). A theory of fairness, competition, and cooperation. Quarterly Journal of Economics, 114(3), 817–868. https://doi.org/10.1162/003355399556151

11.

Dastani

Dignum

(2023). Strategic behavior of large language models in social dilemmas. Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems (AAMAS).

12.

Güth

Schmittberger

Schwarze

(1982). An experimental analysis of ultimatum bargaining. Journal of Economic Behavior & Organization, 3(4), 367–388. https://doi.org/10.1016/0167-2681(82)90011-7

13.

Herold

Heller

Rozemeijer

Mahr

(2025). Brave new procurement deals: An experimental study of how generative artificial intelligence reshapes buyer–supplier negotiations. Journal of Purchasing and Supply Management, 31(1), 101012. https://doi.org/10.1016/j.pursup.2024.101012

14.

Hua

Liu

Jiang

Sun

, . . . Wu

(2024). Game-theoretic LLM: Agent workflow for negotiation games. arXiv preprint arXiv:2411.05990.

15.

Kitadai

Tsurusaki

Fukasawa

Nishino

(2023). Toward a novel methodology in economic experiments: Simulation of the ultimatum game with large language models. 2023 IEEE International Conference on Big Data, 3168–3175. https://doi.org/10.1109/BigData59044.2023.10386856

16.

Loré

Heydari

(2024). Strategic behavior of large language models and the role of game structure versus contextual framing. Scientific Reports, 14, 18490. https://doi.org/10.1038/s41598-024-74353-5

17.

Mei

Xie

Yuan

Jackson

M. O.

(2024). A Turing test of whether AI chatbots are behaviorally similar to humans. Proceedings of the National Academy of Sciences, 121(9), e2313925121. https://doi.org/10.1073/pnas.2313925121

18.

Nash

J. F.

(1950). The bargaining problem. Econometrica, 18(2), 155–162. https://doi.org/10.2307/1907266

19.

Parasuraman

Sheridan

T. B.

Wickens

C. D.

(2000). A model for types and levels of human interaction with automation. IEEE Transactions on Systems, Man, and Cybernetics – Part A: Systems and Humans, 30(3), 286–297. https://doi.org/10.1109/3468.844354

20.

Rahwan

Cebrian

Obradovich

Bongard

Bonnefon

J.-F.

Breazeal

Crandall

J. W.

Christakis

N. A.

Couzin

I. D.

Jackson

M. O.

Jennings

N. R.

Kamar

Kloumann

I. M.

Larochelle

Lazer

McElreath

Mislove

Parkes

D. C.

Pentland

, . . . Wellman

(2019). Machine behaviour. Nature, 568(7753), 477–486. https://doi.org/10.1038/s41586-019-1138-y

21.

Slavik

(2008). Harvard Negotiation Project: Training materials and frameworks. Harvard Program on Negotiation.

22.

Sujan

Furniss

Anderson

(2022). Human factors challenges of AI in safety-critical decision support systems. BMJ Health & Care Informatics, 29(1), e100582. https://doi.org/10.1136/bmjhci-2021-100582

Evaluating Negotiation Capabilities of Large Language Models: From Ultimatum Games to Nash Bargaining

Abstract

Keywords

Get full access to this article

References