Sage Journals: Discover world-class research

Abstract

As AI-powered negotiations spread, their psychological and relational impact remains unclear. The authors propose a novel generative adversarial network framework that trains a bot to aim for superior economic outcomes while appearing “human” (“algorithmic anthropomorphization”). In a bargaining game experiment, they compare this “superhuman” bot with two simpler alternatives: a bot that mimics human behavior and a purely efficient bot. The results show that (1) superficial anthropomorphization can make a bot seem human but does not improve subjective evaluations, (2) the efficient bot is so rational that it is easily exploited, undercutting its performance, and (3) the superhuman bot achieves superior economic results while appearing more human than actual humans. Yet even when bots act indistinguishably from humans, they may trigger an “uncanny valley” effect, lowering subjective evaluations regardless of performance. Because subjective evaluations predict future negotiation outcomes, these findings highlight the potential negative impact that AI bargaining algorithms can have on long-term customer relationships. The authors urge firms to measure more than objective outcomes when assessing AI negotiators.

Keywords

bargaining artificial intelligence generative adversarial network human–AI interactions

To concede that one's intelligence and skills are being surpassed by a machine's can be a disturbing experience, and with the rise of artificial intelligence, humankind will likely experience that feeling more and more frequently.

After playing AlphaGo and losing four of five games, Lee Sedol, a professional Go player, demonstrated an extreme case of “algorithm aversion” (Dietvorst, Simmons, and Massey 2015) and announced that he would retire from professional play, stating that “even if I become the number one, there is an entity that cannot be defeated,” that is, artificial intelligence (Yoo 2019). The chess community was similarly surprised when presented with the latest DeepMind development, AlphaZero. The AI taught itself how to play chess in less than four hours and developed a style that did not resemble anything experts had seen before. Grandmaster Peter H. Nielsen said: “I always wondered how it would be if a superior species landed on earth and showed us how they play chess. I feel now I know” (Klein 2017).

Marketing is rarely interested in such direct, face-to-face adversarial contexts. However, even when human interests align with an AI's objectives and users could benefit from collaborating with AI, they still tend to suffer from algorithm aversion (Dietvorst, Simmons, and Massey 2015). Nascent literature suggests several ways to attenuate such aversion, such as giving users control over the AI's output (Dietvorst, Simmons, and Massey 2018; Kawaguchi 2020), making the algorithm's decision-making process more transparent and understandable to its users (Yeomans et al. 2019), or anthropomorphizing the AI to foster a feeling of trust (De Visser et al. 2016; Waytz, Heafner, and Epley 2014). However, anthropomorphizing an AI to a point where it resembles a human being, but not quite, can also backfire. It may generate unease, discomfort, and even disgust, a phenomenon dubbed the “uncanny valley” (Mathur and Reichling 2016).

Bargaining is another essential human activity that has seen an uptick in AI applications. However, contrary to the domains in which AI is typically deployed, bargaining is neither purely adversarial nor purely collaborative; it stands in a gray area. To be successful, bargaining partners need to collaborate to reach an agreement. Yet, there is always a zero-sum-game component in any negotiation, where negotiators’ interests are partly misaligned, and one bargainer's gains may be the other bargainer's losses.

Despite this peculiarity, intelligent machines are increasingly being used to automate negotiations, and “AI has already begun to dramatically reshape the negotiation landscape” (Falcão Filho 2024b). An automated AI bargainer overcomes many of the shortcomings of human bargainers. A firm can perform one-on-one negotiations on a larger scale with a multitude of customers while maintaining consistent behavior across all its negotiations. Furthermore, an AI bargainer will not be affected by cognitive or emotional biases that could result in economic inefficiencies (Johnson et al. 2002). As a result:
Companies that provide autonomous negotiation products, such as Negobot, Aerchain, Pactum, or Lindy, are becoming popular for sale, procurement, or salary negotiations.

Major corporations like Walmart increasingly use AI agents to negotiate terms with suppliers (Van Hoek et al. 2022).

In 2023, Luminance demonstrated an AI that negotiated a nondisclosure agreement without human involvement (Brown 2023).

In an online interview, JLL's global CMO, Siddharth Taparia, described how the firm's in-house large language model enabled a real estate contract negotiation process to be completed in hours rather than months (Hood and Wei 2024).

Platforms like Retrograde act as “AI talent agents,” managing the initial rounds of negotiation with brands on behalf of social media influencers (McDowell 2024).

Unfortunately, research on the psychological and relational aspects of negotiation between humans and AI is not progressing at a pace that corresponds with the rate at which AI is being deployed in practice. We find this trend worrisome for two reasons.

First, the success of a good negotiation cannot be measured by its short-term financial result alone. To paraphrase previous examples, no customer wants to “negotiate with an alien” or be “defeated” by the firm they negotiate with. An AI that exclusively focuses on its economic outcomes and develops unorthodox bargaining tactics to gain an advantage may be perceived as unpleasant to negotiate with, unfair, or even exploitative; consequently, it may be poorly evaluated by customers. Subjective evaluations of negotiations are crucial for firms as they impact the relationship with the customers they bargain with and are directly proportional to the economic efficiency of any future negotiation (Curhan, Elfenbein, and Kilduff 2009). A firm deploying a ruthless bargaining AI may inadvertently sacrifice long-term customer relationships.

Second, while algorithm aversion appears to be of great concern in bargaining situations (as it is both likely to occur and detrimental to the firm's long-term profits), classic solutions do not apply. Because of the adversarial nature of bargaining, one cannot alleviate algorithm aversion by simply giving control over the AI's outputs (Dietvorst, Simmons, and Massey 2018; Kawaguchi 2020) or making the decision-making process more transparent (Yeomans et al. 2019) as both tactics would go against the firm's interests. Likewise, it is doubtful that giving the AI a human name and a smiling avatar (a process we label “superficial anthropomorphism”) would solve algorithm aversion altogether.

In this article, we examine whether it would be possible to develop an AI that behaves like a human (to reduce algorithm aversion) yet remains efficient enough to be of value to the firm. We develop the notion of “algorithmic anthropomorphism,” which we define as an agent that optimizes its objective outcomes while appearing humanlike in its behavior. We develop a novel generative adversarial network (GAN) framework to develop a bargaining bot that optimizes its economic outcomes while retaining human traits in the way it negotiates (a trade-off we control with a single parameter). We call it a superhuman bot as it behaves like a human yet displays abilities (i.e., rationality, consistency) that exceed those usually found in humans. We compare it with two other bot versions: a primitive bot designed to mimic a sample of human negotiators (including their quirks and irrationalities) and an efficient bot focused on maximizing one's economic outcome exclusively. Using simulations, we confirm that all bots achieve their respective goals.

We then asked 1,019 participants to play 20,380 unstructured, continuous-time bargaining games with asymmetric information (Camerer, Nave, and Smith 2019), representing many real-world bargaining scenarios (Karagözoğlu and Kocher 2019). They were paired either with one of the bargaining bots or with another human being for the experiment, and the bots were either depicted as such or portrayed as other human beings. We compare the objective performance of the bots as well as the subjective evaluations of the negotiations by the participants.

In the limiting context of our small-stakes bargaining game, we find that the superhuman and primitive bots behave in ways that are indistinguishable from humans (i.e., they both pass the Turing test) and does so in a far more humanlike manner than the efficient bot. Interestingly, the efficient bot does not perform as well in real life as our simulations predict: Human participants learn to exploit the extreme rationality of the bot to their advantage, especially when they are informed they are negotiating with a bot, up to a point where human negotiators manage to mitigate their bargaining disadvantage. The best performer is the superhuman bot, which pursues its objective rationally yet displays humanlike behaviors that are hard to exploit.

However, even though participants could not distinguish the superhuman bot from actual human beings, human-versus-human negotiations still achieved the highest subjective evaluations, and no bot version compared favorably to humans in terms of perceived negotiation outcome, process, or relationship. Although our exploration of the phenomenon is limited to a single, low-stakes bargaining game, our research may constitute an initial warning call that, even after AI reaches perfect anthropomorphism and excellent objective success metrics, asking an AI to negotiate on the firm's behalf may not be free from adverse long-term consequences on customer relationships.

We organize the article as follows. After conducting a literature review, we present the mechanics and design of the game that serves as an initial test bed for our bot development. We then describe the development of the three bot versions (primitive, efficient, superhuman) and present simulation results. We report the results of our controlled experiment and conclude with managerial implications and directions for future research.

Literature Review

Subjective Value Inventory

In negotiations, subjective evaluations matter. A bargainer can be psychologically affected by a negotiation, including, but not limited to, effects on their self-esteem, their satisfaction with the bargaining process, and their willingness to negotiate with their counterpart in the future (Curhan, Elfenbein, and Xu 2006). These subjective evaluations not only impact the long-term relationship between bargainers but also predict future negotiation outcomes (Curhan, Elfenbein, and Kilduff 2009).

In the literature, subjective evaluations of negotiation are typically measured using the Subjective Value Inventory, or SVI (Curhan, Elfenbein, and Xu 2006). The SVI has four subconstructs: feelings about the outcome, the self, the process, and the relationship. We report the survey items in Web Appendix A.

The SVI is a widely accepted measure for subjective evaluations of negotiations. Some examples of its use include research on how subjective evaluations are affected by negotiator creativity (De Pauw, Venter, and Neethling 2011) and negotiator deception (Van Zant, Kennedy, and Kray 2023). It has also been used in longitudinal studies to show that the subjective evaluations of an initial negotiation can negatively impact the objective outcomes of a subsequent negotiation due to a spillover of incidental anger and pride (Becker and Curhan 2018). The SVI has also been modified and used outside the domain of negotiation, such as to operationalize procedural justice and determine how it is affected by status and power (Blader and Chen 2012).

Algorithm Aversion

Algorithm aversion is a “biased assessment of an algorithm which manifests in negative behaviors and attitudes toward the algorithm compared to a human agent” (Jussupow, Benbasat, and Heinzl 2020, p. 4). It can manifest in people's distrust or reluctance to rely on advice from an automated system, whereas they would accept the same advice from another human being. Algorithm aversion has been observed in a variety of AI applications, including medical diagnostics (Dai and Singh 2020; Longoni, Bonezzi, and Morewedge 2019), forecasting (Dietvorst, Simmons, and Massey 2015, 2018), product assortment determination (Kawaguchi 2020), and selling through chatbots (Luo et al. 2019).

Given how widespread algorithm aversion appears to be, one can only wonder how it could manifest in a bargaining situation. If people generally distrust algorithms, the same negotiation outcome might be more negatively evaluated if it is obtained by negotiating with an AI than with a human being, with known adverse long-term consequences on relationships and future negotiation outcomes. Consequently, we conjecture:
H₁: Everything else being equal, a participant gives lower subjective evaluations when they negotiate against an AI than when they negotiate against another human negotiator.
However, previous research on algorithm aversion has been systematically conducted in contexts where the objectives of the AI and the interests of the users are aligned. Whether obtaining the correct diagnosis or improving product assortment, the AI and the users have the same objectives. Because of its competitive nature, bargaining is different from these applications; the interests of the bargainer do not perfectly align with the AI's objectives, and it is not clear how prior research may apply in this context. Although there has been some work on understanding bargaining against AI from a game theoretical point of view (see March [2021] for a review), the literature is silent on how algorithm aversion may manifest in bargaining.

Anthropomorphization

Prior research on human–AI interactions has examined possible ways to attenuate the effects of algorithm aversion. In forecasting, this can be accomplished by giving people some control over the forecasts (Dietvorst, Simmons, and Massey 2018; Kawaguchi 2020). For recommender systems, algorithm aversion can be reduced by making the algorithm's decision-making process more understandable to its users (Yeomans et al. 2019). However, for strategic interactions like bargaining, neither of these approaches is applicable as they would at least partially divulge the algorithm's strategy, thereby hurting the firm's profits.

In this article, we explore another way discussed in the literature to mitigate algorithm aversion, namely superficial anthropomorphization, which we define as giving superficial human characteristics, such as a name or physical appearance, to an AI. Anthropomorphizing an autonomous agent can foster a feeling of trust in its users (De Visser et al. 2016; Waytz, Heafner, and Epley 2014) and does not compromise the fundamental objective of the firm, hence making it ideally suited for alleviating the effects of algorithm aversion in bargaining situations. In the literature, human characteristics have usually been conveyed through avatars, audio, or textual communication (Aggarwal and McGill 2007; Crolic et al. 2022; Kwak, Puzakova, and Rocereto 2015; Wan, Chen, and Jin 2017) and have been shown to be quite effective. Hence, we expect:
H₂: Superficial anthropomorphization contributes to portraying an AI negotiator as human.

H₃: Portraying an AI negotiator as human through superficial anthropomorphization improves subjective evaluations.
However, when humans and AI interact, such superficial anthropomorphization solves only part of the problem, as the strategy itself can signal whether it originated from an AI bot or a human player. For example, AlphaGo radically diverged from commonly used strategies to defeat Lee Sedol. In the same vein, a bot that is trained to optimize its own payoffs in bargaining games will likely discover unorthodox (i.e., nonhumanlike) strategies to do so. Facebook's research team observed that their negotiating bots autonomously developed a new and unrecognizable language to find optimal bargaining solutions more efficiently (Lewis et al. 2017). In a time-bound bargaining setting, an AI may learn that responding immediately after a counterproposal might save time on the clock and be a more efficient tactic than pausing before responding; yet, it might be unsettling for a human bargainer to think carefully about the best strategy, only to observe that their counterpart needs a split second to respond. Hence, to appear humanlike in bargaining games, an AI has to be anthropomorphic not only in its appearance but also in the strategies it uses.

Some recent work has examined anthropomorphizing the strategies of AI in specific games (Jacob et al. 2022; McIlroy-Young et al. 2022), but these studies focus primarily on the strategic or algorithmic advantages of doing so rather than on their psychological consequences. They also concentrate on structured, turn-based games like Diplomacy and chess. They cannot be easily extended to unstructured bargaining games where any player can make an offer at any time. We address these gaps in the literature by building various AI bots specifically catered to unstructured bargaining games using deep learning techniques. We discuss the bargaining game mechanics in the next section.

Game Mechanics

We base our research on the game design proposed by Camerer, Nave, and Smith (2019). It is an unstructured bargaining game, played in continuous time, with a fixed deadline and one-sided private information about the pie size. Each game proceeds as follows:
A pie size is randomly selected from {1, 2, 3, 4, 5, 6} and assigned to the game.

Two players enter the game, with the pie size known to only one (the informed player).

Before the bargaining game begins, both players have 5 seconds to select their first offer. At this stage, neither player has information about what the other player is doing.

Once the game begins, each player bargains over the uninformed player's payoff. An offer of x from either player implies that the uninformed player will receive x, and the informed player will receive (pie size − x). Players can click on the upper part of their screen at any moment (see Figure 1) and as often as they please to make a proposal or counterproposal, between $0 and $6, in 10-cent increments. They do not need to wait for the other bargainer's response and can make multiple proposals in a row.

Any change in the other player's current proposal is displayed in real time in the lower part of the screen. If both players agree on the same amount, a green bar appears, indicating that they are in agreement (see Figure 1).

The game ends when both players remain in agreement for 1.5 seconds—in which case the pie is split accordingly between the two players—or automatically after 10 seconds of bargaining, in which case neither player receives any payoff. Hence, the latest time for reaching a successful agreement is 8.5 seconds.

In the end, both players are informed of the game's outcome. At that time, the uninformed player is also informed of the size of the pie they were negotiating over.

Figure 1.
Game Interface.

While this game format is a stylized form of negotiation, the literature reckons that it is representative of many real-world bargaining scenarios (Karagözoğlu and Kocher 2019).

An important aspect of this game is the information asymmetry. Asymmetric information is representative of many consumer-versus-firm bargaining scenarios, as firms tend to have more information regarding a product than a consumer (e.g., stock availability, current demand, actual production costs, part quality). Therefore, in the context of this research, we focus on human players assuming the role of uninformed players.

Methodology

Overview

We build three distinct AI bots operating with different degrees of anthropomorphization and efficiency:
primitive—a fully anthropomorphized bot that behaves exactly like human bargainers observed in Camerer, Nave, and Smith (2019).

efficient—a bot whose strategy is not anthropomorphized but instead focuses entirely on maximizing its own payoffs.

superhuman—a partially anthropomorphized bot that finds a balance between optimizing its payoffs and behaving like human beings.

Notice that we expect the superhuman bot's behaviors to fall between the primitive and the efficient bots’ behaviors. Because its design methodology builds on both approaches, we discuss its development last.

The superhuman bot is unique methodologically, and we could not find an equivalent in the literature. Typically, AI bots are either trained to be as efficient as possible and achieve the highest level of competency (e.g., to play Go, fly a fighter jet, or optimize a production line) or to mimic human beings in how they write, paint, or compose (i.e., generative AI). In particular, games solely focus on optimizing behavior (Lewis et al. 2017) or behaving like humans (Jacob et al. 2022; McIlroy-Young et al. 2022), but not both. Bargaining is a unique context in that competitive and collaborative goals intersect, and efficiency and anthropomorphism goals may need to coexist. Therefore, we design a novel GAN framework that simultaneously pursues both goals: efficiency and anthropomorphization (Figure 2).

Figure 2.
Overview of the Development Phases of the primitive, efficient, and superhuman Bots, Based on the Bargaining Game Data from Camerer, Nave, and Smith (2019).

We first explain the intuition behind the four-stage development of these three bots (Figure 1) and develop the technical aspects and formal hypotheses in subsequent sections.

First, we obtain the dataset from Camerer, Nave, and Smith (2019), where 110 participants collectively played 6,432 games designed according to the game mechanics described previously. The participant pool was divided into 55 pairs of one informed and one uninformed player, and each pair bargained for 120 rounds (after completing 15 rounds of practice).

Second, using supervised learning and calibrating the models on the dataset available from Camerer, Nave, and Smith (2019), we develop two primitive bots replicating human behaviors, patterns, and even quirks and irrationalities. The first bot is calibrated on the observations made on the informed players and subsequently tested in our experimental setting. The second bot is calibrated on the uninformed players and exclusively used internally as a sparring partner to train more advanced bot versions. Once the bots are calibrated, we let the primitive informed bot and the primitive uninformed bot bargain against one another for thousands of simulations and confirm they perfectly replicate the bargaining behaviors observed by Camerer, Nave, and Smith.

Third, using the primitive bots as starting solutions, we continue our simulations but allow the informed primitive bot to improve its behaviors through reinforcement learning, with the sole objective of improving its monetary payoff. Over time, this improved bot, which we label the efficient bot, while adapting its behaviors to a human(like) counterpart (simulated by a primitive uninformed bot), learns to act more “rationally.” For instance, it does not reject (out of spite) offers that would be beneficial, makes counteroffers more quickly to leave the other player additional time to think, and learns not to accept an offer in the last 1.5 seconds of the game, when an agreement could not be validated anyway.

Fourth, we repeat the previous step but train in parallel an additional deep learning model (a judge) that is tasked with identifying whether a bargaining sequence has been generated by a humanlike bargainer (i.e., the primitive bot) or by an efficient bot. The rewards of the bot are a combination of (1) how much money it makes and (2) how “undetected” by the judge it remains. Both components can be weighted differently, and we test various versions in which the judge's detection is given more or less weight. The efficient bot and the judge enter a game of “cat and mouse,” where the bargaining bot tries to adapt its behaviors to maximize its monetary payoffs while avoiding behaviors that would be identified as not humanlike. For instance, for the sole purpose of deception, the bargaining bot might learn to occasionally reject unfair offers (which is irrational from a monetary perspective but still very typical of human beings) or to wait an artificial period of time before making a counteroffer to “pretend” it is thinking as human beings do. In this modeling framework, often called a GAN, one deep learning model learns to generate new data (here, new bargaining behaviors) while avoiding detection against human standards. The resulting outcome is a bot that behaves within the bounds of what is considered humanlike but is more efficient than most humans. We call it a superhuman bot because, while retaining some human traits, this bot achieves levels of rationality that most humans do not display.

Deep Learning Model

It is difficult to explicitly model humanlike behavior in bargaining games with a finite set of equations, especially in continuous time. Therefore, we use deep learning techniques that allow models to learn complex patterns from data without any elaborate theoretical specification. We use a similar sequential deep learning architecture to model all AI bots. From a prediction perspective, these architectures lend well to applications like bargaining games, where each action in a game is dependent on all other actions that precede it. Specifically, we use a long short-term memory (LSTM) neural network (Hochreiter and Schmidhuber 1997). LSTM models perform well in prediction tasks that deal with sequential data and have been used in marketing applications (Sarkar and De Bruyn 2021; Valendin et al. 2022; Wang et al. 2022). The modeling process can be broken down into three distinct tasks, which we discuss in turn: feature engineering, policy formulation, and objective function specification.

Feature Engineering

We use feature engineering to extract input features at each step, defined as either (a) the beginning of the initial five-second period where players enter their initial offer, (b) the beginning of the up-to-ten-second bargaining game, (c) any offer or counteroffer observed during the bargaining session, or (d) the end of the game. Therefore, each sequence has a minimum of three steps (a, b, d) and an unspecified maximum length (a, b, c, c, c, …, d). Each input fully represents the state of the game at each step. We list the features in Table 1.

Table 1.
Features of the LSTM Model Used to Describe the State of the Game at Each Step of the Bargaining Process.

Feature Name Description

Duration The amount of time elapsed since the previous action.

Game start A dummy variable (0 or 1) that indicates the first step in the sequence.

Game end A dummy variable (0 or 1) that indicates the last step in the sequence.

Self game state Indicates whether the latest action of the player is to do nothing (0), make a counteroffer (1), or accept the other player's offer (2).

Self offer The player's current offer.

Other game state Indicates whether the latest action of the opponent is to do nothing (0), make a counteroffer (1), or accept the other player's offer (2).

Other offer The other player's current offer.

Fairness The extent to which the player's offer deviates from a fair split of the pie. For the uninformed player, this feature stores 0 as they do not have information about the pie size being bargained over, and hence will not know what a fair split is supposed to be in that game. For the informed player, this feature stores the value (self offer − (pie/2))/(pie/2).

Pie size Stores pie size for the informed player. The value is set to 0 for the uninformed player.

Notes: These features have been refined through trial and error to help the model learn as efficiently as possible. For instance, we found that specifying the feature “fairness” as a separate component (even though it could be inferred from “offer” and “pie size”) helps the model converge faster.

Policy Formulation

We use the parameters from the LSTM as inputs to a function that defines the effective action space of a bargainer. We refer to this as a policy likelihood function since it tells us the likelihood of any action that can be taken by an AI at each step of the game.

At each step i, the player can follow three distinct actions: make a counteroffer (c), accept the offer of the other player (a), or wait indefinitely until the next event, such as the other player's action or the end of the game (w). We note the probabilities of each action as p_c, p_a, and p_w, with p_c, p_a, p_w > 0 and p_c + p_a + p_w = 1 (for simplicity, we do not specify the index i).

If the player makes a counteroffer, the model needs to predict two additional quantities: how long the player will wait to make the offer (between zero and ten seconds) and what that counteroffer will be (between $0 and $6). Because predictions are probabilistic and will subsequently be used to generate behaviors stochastically, we model each quantity (time and amount) as a distribution rather than a scalar. In addition, we reckon that no standard distribution (e.g., normal, lognormal, or beta) will fully capture the complexity of patterns observed in real-life data. Hence, we model each distribution as a mixture of truncated Gaussian distributions (truncated to 0–10 seconds for time and $0–$6 for amounts). The exact number of components (i.e., the number of truncated Gaussian distributions) will subsequently be determined as hyperparameters of the model. Each truncated Gaussian component has three parameters: a mean (bounded within the acceptable interval), a standard deviation (>0), and a weight (a percentage indicating how much that component contributes to the overall distribution). Each of these parameters is, itself, an output of the LSTM model, and will be predicted based on the sequence of events observed up to this point in the game. We collectively denote Φ_c and φ_c as the cumulative distribution function (CDF) and probability density function (PDF) of the mixture of truncated Gaussian distributions used to predict the timing of the counteroffer and Φ_x and φ_x as the CDF and PDF of the mixture that captures the amount of that counteroffer.

If the player accepts the other player's current offer, we only model the time they wait until they accept, denoted as Φ_a and φ_a. We use the same modeling approach as used previously.

If the player decides to wait indefinitely, no other quantity needs to be predicted.

The LSTM model will take the entire sequence of events observed in the game so far (described by the features listed previously at each step) and predict p_c, p_a, p_w, φ_c, φ_x, and φ_a. The number of outputs of the model will depend on the number of components in each mixture of truncated Gaussian distributions,¹ which are themselves hyperparameters of the model.

Not all predictions are used at each step. For instance, if a player accepts the other player's offer, φ_c are φ_x are predicted but unused.

Objective Functions

While the LSTM architecture is identical for all bot versions, their strategic prerogatives are codified through their respective objective functions, which we describe next.

primitive bot

The objective of the primitive bot is to replicate as precisely as possible the behaviors observed among subjects of Camerer, Nave, and Smith (2019). Therefore, the parameters of the LSTM model are estimated through supervised learning to maximize a likelihood function. When modeling the behaviors of the informed (focal) player, we do not jointly estimate the behaviors of the uninformed (other) player; the latter are treated as “external” events by the model.

Suppose that, at step i of the game, a player is observed to make a counteroffer of amount x at time t. The probability of such an event can be denoted as
$π_{i, c} = p_{i, c} . φ_{i, c} (t_{i + 1}) . φ_{i, x} (x_{i + 1}),$
(1)
where p_i,c is the predicted probability that the player was indeed about to make a counteroffer at step i, and φ_i,c and φ_i,x are the PDFs of the mixture of truncated Gaussian distributions predicting both the time and the amount of such counteroffer. Note that we use the indices i + 1 for the time and the amount of the decision because these quantities become the new inputs of the LSTM model at the next step (i + 1) of the game.

If the player accepts the other player's offer, the likelihood simplifies to
$π_{i, a} = p_{i, a} . φ_{i, a} (t_{i + 1}) .$
(2)

However, if the player's opponent is the first to make a move instead, the likelihood needs to capture three distinct possibilities: (1) the focal player had decided to wait, (2) the focal player was about to accept the current proposal, but their opponent acted before they could, or (3) the focal player was about to make a counterproposal but did not have time to act before their opponent made a move. The likelihood then becomes
$π_{i, w} = p_{i, w} + p_{i, a} \times (1 - Φ_{i, a} (t_{i + 1})) + p_{i, c} \times (1 - Φ_{i, c} (t_{i + 1})) .$
(3)

Note the use of the CDF (Φ) rather than the PDF (φ) of the distributions here. The same logic applies if the game reaches the ten-second limit (i.e., the player planned to let the game end without acting or was about to accept the proposal but poorly managed their time).

Figure 3 summarizes the various components of the likelihood function that are activated based on the specific event observed at time t_i ₊ ₁.

Figure 3.
Components of the Likelihood Function Based on the Event (i.e., Focal Player's Action, Other Player's Action, End of Game) Observed at Time t_i.

By letting $1$ denote an indicator function that takes a value of 1 or 0 depending on the tactic that is actually observed at the next step (i + 1), and by remembering that the last step of each game (i = I) is the game ending, where no additional decision is to be made, we can write the full likelihood function as
$π = \sum_{i}^{I - 1} (π_{i, c} \times 1_{i + 1, c} + π_{i, a} \times 1_{i + 1, a} + π_{i, w} \times 1_{i + 1, w}) .$
(4)

We train the primitive bot by calibrating the model parameters on the data collected by Camerer, Nave, and Smith (2019) as examples of human-versus-human bargaining behavior. Its objective function is given in Equation 5, which we minimize.
${Loss}_{primitive} = - \sum_{n = 1}^{N} \ln (π (θ_{primitive} | X_{n})) .$
(5)

Because the sole objective of the primitive bot is to appear humanlike, and deep learning models are exceptional function approximators, we expect:
H₄: The primitive bot behaves in a way that is indistinguishable from humans.

efficient bot

The efficient bot uses a reinforcement learning approach to optimize the payoffs it receives in each game. Deep reinforcement learning (DRL) has been recently used to solve marketing problems like finding optimal targeting policies for firms (Liu 2022; Wang et al. 2022). In comparison, unstructured bargaining is a unique application as the offer amounts and timings are both continuous, and the environment has a fair amount of uncertainty. Hence, we use a DRL algorithm called policy gradient that optimizes policies instead of functions that estimate current and future rewards, such as state or action-value functions. As a result, it is better suited for environments where the agent deals with a continuous action space and can pursue stochastic policies (Sutton and Barto 2018). Using an LSTM architecture to formulate policies necessitates a particular kind of policy gradient known as a recurrent policy gradient (Wierstra et al. 2010). Accordingly, the objective function of the efficient bot is the product of the log-likelihood of observing a given policy in a game and the incremental reward the agent gets on acting according to that policy. Intuitively, if the bot follows a strategy with a high (resp. low) associated probability and receives a disappointing payoff, the reward function will be strongly (resp. weakly) penalized. The reward function is defined as
${Reward}_{efficient} = \frac{1}{N} \sum_{n = 1}^{N} \ln (π (θ_{efficient} | S_{n})) (R_{n} - b_{ρ (n)}) .$
(6)

The term ln(π(θ_efficient | S_n)) is the log-likelihood of observing a game sequence (S_n) from a policy parameterized by θ_efficient. The incremental reward is given by (R_n − b_p(n)), where n is the game index (∈[1, N]), R_n is the end-of-game payoff, and b_p(n) is the average reward² observed for a game with that particular pie size. Introducing a baseline is important in the policy gradient algorithm as it ensures the variance of rewards is manageable during the calibration process, thereby facilitating convergence (Williams 1992). Since the average expected reward evolves continuously (with learning), it is updated after each epoch using exponential smoothing.

Reinforcement learning requires thousands upon thousands of games to learn, and it is neither economically nor practically feasible to use humans as sparring partners during the training phase. Therefore, we train the efficient bot by letting it interact with the uninformed version of the primitive bot in simulated bargaining games. Using a human behavior simulator to train DRL models is quite common in the literature. For instance, in marketing, DRL-based sequential targeting models have been trained through interactions with other sequence models that simulate customer behavior (Wang et al. 2022). It allows the model to explore counterfactual human responses to policies that are not observable in the dataset and is more cost-effective than training against human bargainers (Figure 4).

Figure 4.
Reinforcement Learning Algorithm for the efficient Bot Training.

The reinforcement learning algorithm will train the efficient bot to become a better negotiator than the primitive bot; it should be able to “unlearn” inefficient but very humanlike strategies, such as making a counteroffer too late, rejecting lowball offers out of spite, or refusing to agree to a reasonable counteroffer by ego or miscalculations. Thus, we expect:
H₅: The efficient bot achieves better objective outcomes than the primitive bot.
However, in its pursuit of achieving better economic results, the efficient bot may engage in strategies that are atypical of human negotiators, and we expect their human counterparts to pick on them (even if the efficient bot is portrayed as human through superficial anthropomorphization):
H₆: The efficient bot is perceived as less humanlike than the primitive bot.

superhuman bot

The third bot is built on the concept of algorithmic anthropomorphization. We refer to it as a superhuman bot because it attempts to behave like a human being in the bargaining game while eliminating some of the inefficiencies characteristic of human bargainers. We use an adversarial modeling approach to train our superhuman bot. Adversarial modeling has been used in the machine learning literature in domains like cybersecurity (Martins et al. 2020; Rosenberg et al. 2021). GAN (Goodfellow et al. 2014) is a variant of adversarial modeling that has gained popularity in the AI research community and has been applied to a variety of contexts, including marketing (Anand and Lee 2023).

In our context, the training of the superhuman bot involves two independent models, such that one (the earner) tries to optimize its rewards in the bargaining game, while the other (the judge) punishes its adversary for straying away from humanlike behaviors in its quest to do so. The earner uses the same reinforcement learning approach and algorithm (with a small twist, discussed subsequently) as the efficient bot to optimize payoffs. The judge uses a supervised learning approach to determine whether, in a simulated game, the informed player was played by a primitive bot (a proxy for humanlike behavior) or by the earner.

For the judge, we simulate N₁ games with earner versus primitive bots (y_n = 1) and N₂ games with primitive versus primitive bots (y_n = 0) . The N₁ + N₂ games are labeled appropriately and submitted as input to the judge. The judge is an encoder LSTM that takes an entire bargaining game sequence S_n as input, forward-propagates it, and predicts the probability ŷ_n that the informed player in the nth sequence was played by an earner (vs. a primitive bot). Its loss function is a binary cross-entropy:
${Loss}_{judge} = - \frac{1}{N_{1} + N_{2}} \sum_{n = 1}^{N_{1} + N_{2}} y_{n} \ln ({\hat{y}}_{n}) + (1 - y_{n}) \ln (1 - {\hat{y}}_{n}) .$
(7)

Here, the earner's rewards are the sum of (1) its economic performance and (2) the prediction from the judge, such that the more the earner remains undetected in a game sequence, the higher its reward. At the extreme, if the earner is identified with absolute certainty (ŷ_n = 1), its payoff only stems from its economic performance. This mechanism incentivizes the earner to behave like humans in its strategies while trying to optimize its payoffs. Since the goals of the judge and the earner are misaligned, their interactions are adversarial in nature. To train the superhuman bot, Equation 6 is modified as follows, where the parameter λ controls how important it is for the earner to behave like humans. With λ = 0, the superhuman and efficient bot objective functions become identical:
${Reward}_{earner} = \frac{1}{N_{1}} \sum_{n = 1}^{N_{1}} \ln (π (θ_{earner} | S_{n})) ((R_{n} - b_{ρ (n)}) + λ (1 - {\hat{y}}_{n})) .$
(8)

Importantly, in traditional GAN algorithms, the earner's objective function is exclusively equal to (1 − ŷ_n), and there is no λ parameter; that is, they impose no trade-off between appearing humanlike and pursuing objective performance. The earner's goal is to create output that is as humanlike as possible. In computer science, GAN algorithms perform wonders because the judge (relying on deep learning models) is incredibly good at discriminating the earner’s behaviors from humans’ behaviors and creations. Any mismatch is spotted, imposing high standards on the earner to produce output indistinguishable from human beings. Relying on a traditional GAN approach and the superior ability of the judge to identify nonhuman outputs, the computer science community has generated painting, poetry, music, writing, and other art forms indistinguishable from human creations (e.g., Artist 2024).

But what is a blessing in most contexts is a curse in ours. If a judge is “unbeatable” at distinguishing an earner's bargaining behaviors from humans’, and if the penalty for nonhumanlike behavior is high, the earner will quickly learn to behave precisely like humans to maximize its rewards. Yet, if it learns to mimic behaviors perfectly to remain undetected, it cannot improve, and the superhuman bot will converge toward the primitive bot.

Alternatively, if the penalty for not behaving like humans is small, the superhuman bot will learn that ignoring such a penalty is the price to pay for improving the objective outcomes of the negotiation, and it will converge toward the efficient bot. We found that rather than creating a trade-off between efficiency and humanlike behaviors, traditional GAN algorithms produce extreme solutions that depend on the relative weight of the penalty.

To achieve our balanced objective, we develop an innovative GAN algorithm in which the judge learns asynchronously. This creates opportunities for the earner to pursue both objectives simultaneously, that is, to optimize its economic payoffs while acting humanlike. We subsequently demonstrate experimentally that this novel GAN approach works very well indeed.

Traditionally, both the earner and the judge are trained simultaneously, with the earner constantly trying to improve its payoffs while avoiding detection, hence maximizing its reward, and the judge constantly trying to detect the earner, hence minimizing its loss. However, this approach leads to all-or-nothing strategies that we wish to avoid. Instead, we use the approach reported in Figure 5, where we fully train the earner (initialized with the parameters of the primitive bot to guarantee a humanlike starting point), then the judge, then the earner again, and so on, sequentially.

Figure 5.
GAN Algorithm for the superhuman Bot Training.

In the first step of the main loop, the earner converges toward the efficient bot, and the judge quickly learns to identify such behaviors. In the second step, the earner learns to adapt to the new judge to avoid detection and converges to new, more subtle behaviors that the judge learns to identify as well. As the training process goes on, the judge gets increasingly varied examples of earners’ behaviors. The key of this algorithm is that the earners’ behaviors from all steps are retained to train the judge; that is, the data on which the judge is trained keep increasing (in traditional GAN algorithms, the most recent earner's behaviors replace older ones). Because the judge is trained on both recent and “obsolete” behaviors, it leaves the earner “space” to optimize both reward components simultaneously. After a few steps, the judge converges (the likelihood of distinguishing the earner's behaviors from humanlike behaviors stays within a tight range), and so does the earner, so we stop the algorithm. Therefore, due to the adversarial nature of the training process, the earner gets better at appearing more anthropomorphic in its strategies as it learns to deceive an increasingly adept judge. In this way, the superhuman bot tries to earn high payoffs while appearing humanlike. Thus, we expect the superhuman bot to reach a compromise:
H₇: The superhuman bot achieves better objective outcomes than the primitive bot.

H₈: The superhuman bot is perceived as more humanlike than the efficient bot.

Training Details

We train all our models using the Adam optimizer (Kingma and Ba 2014). We prevent overfitting by using L2 regularization (weight decay). We tune hyperparameters of learning rate, weight decay, number of nodes in each neural network, and number of mixtures in each truncated Gaussian mixture model distribution by using the Bayesian optimization technique (Snoek, Larochelle, and Adams 2012). Details of hyperparameter tuning are provided in Web Appendix C.

Simulations

We conduct a simulation study to evaluate the performance of five bot versions: the primitive bot; the superhuman bot with λ = .5, λ = .3, and λ = .1; and the efficient bot. The lower (resp. higher) the value λ, the less the bot is incentivized to behave like humans (resp. efficiently). With λ = .1 (resp. .5), the superhuman bot should behave more similarly to the efficient (resp. primitive) bot.

The idea behind this study is to generate synthetic bargaining games using our models against an uninformed primitive bot. Again, the uninformed primitive bot is a proxy for humanlike behavior. As in Camerer, Nave, and Smith (2019), the pie size is randomly selected from a uniform distribution over {$1, $2, $3, $4, $5, $6}. For completeness, we also report the results of human-versus-human games observed by Camerer, Nave, and Smith in a similar setting.

We summarize the economic performance of each bot in Table 2, which reports key statistics across all simulations. We observe that the primitive bot behaviors (second column) are strikingly similar to human bargainers’ (first column, reported for reference), as observed by Camerer, Nave, and Smith (2019). In particular, the primitive bot simulations replicate multiple key findings reported by Camerer, Nave, and Smith, such as (1) the number of deals increases with pie sizes, (2) most deals happen in the last few seconds, and (3) the fair splits are most likely to occur for small and medium pie sizes than for large ones. As displayed in Figure 6, the distributions of behaviors across those key indicators are also strikingly similar (H₄), demonstrating the ability of deep learning models to replicate humanlike behaviors.

Figure 6.
Distribution of Key Behavioral Indicators from the Informed Players.

Table 2.
Key Statistics for the Simulations.

← Fully Human Fully Efficient →

Human vs. Human primitive vs. primitive superhuman vs. primitive superhuman vs. primitive superhuman vs. primitive efficient vs. primitive

(Camerer, Nave, and Smith 2019) (λ = .5) (λ = .3) (λ = .1)

Acceptance rate

Overall .50 .51 .43 .58 .60 .77

Pies $1–$4 .42 .41 .35 .46 .49 .71

Pies $5 and $6 .66 .70 .59 .81 .82 .90

Average payoffs—including strikes

Informed .28 (.31) .28 (.32) .26 (.33) .33 (.35) .37 (.36) .46 (.34)

Uninformed .22 (.27) .23 (.27) .17 (.25) .25 (.29) .23 (.28) .31 (.28)

Average payoffs—if agreement

Informed .55 (.20) .55 (.22) .60 (.23) .57 (.26) .61 (.26) .59 (.26)

Uninformed .45 (.20) .45 (.22) .40 (.23) .43 (.26) .39 (.26) .41 (.26)

Average number of counteroffers

Informed 2.22 (1.67) 2.16 (1.46) 2.42 (1.79) 2.19 (1.23) 1.61 (.82) 5.24 (2.89)

Uninformed 2.00 (1.03) 1.97 (.95) 2.06 (.96) 2.10 (.92) 2.12 (.93) 2.06 (.75)

Average time to acceptance 7.54 (2.27) 7.34 (2.20) 7.49 (2.16) 7.26 (1.87) 7.13 (2.00) 6.54 (1.87)

Percentage of agreements

0 to 6 seconds 18.27 18.50 15.08 27.80 31.86 53.20

6 to 8.5 seconds 32.31 29.04 26.93 29.15 31.05 35.09

8.5 to 10 seconds 21.52 20.93 19.76 15.42 18.21 7.74

Percentage of fair splits

Overall 33.78 24.34 13.77 17.98 15.39 14.37

Pies $1–$4 44.18 32.33 16.95 23.40 19.20 15.97

Pies $5 and $6 20.90 14.88 9.86 11.98 10.85 11.92

Bot detection N.A. .0% 31.0% 59.6% 69.4% 100.0%

Notes: The earnings are expressed as percentages of pie size; that is, a value of .28 means that the player earns, on average, 28% of the pie. The sum of both players’ earnings (when including strikes) is not equal to one due to players not always reaching an agreement, hence earning $0. In case of agreement, they are. Standard deviations are in parentheses, not reported for binary outcomes. N.A. = not applicable.

The efficient bot achieves superior economic performance (H₅) and diverges from the primitive bot behaviors (H₆) in four distinctive ways:
It cares far less about fairness. The number of fair splits (defined as when each player gets between 45% and 55% of the pie) drops from 24.34% to 14.37%.

The number of agreements within the last 1.5 seconds drops from 20.93% to 7.74%. An agreement reached in the last 1.5 seconds leads to a strike. The efficient bot tries to avoid such blunders but cannot prevent them all due to the human player's behaviors.

The optimal strategy in this game is to secure as many deals as possible, and the efficient bot makes such an outcome more likely (from 51% agreement rate to 77%).

The efficient bot agrees faster to a deal and acts frantically during the negotiation process, making an average of 5.24 counteroffers in a game (against 2.16 for the primitive bot). Knowing that 77% of games lead to an agreement, and more than half of these agreements (53.2%) occur in the first 6 seconds, this represents many fast-paced counteroffers in only a few seconds.

As expected, the superhuman bot shares more similarities with the primitive bot when the penalty for not acting humanlike is high (λ = .5) and more with the efficient bot when the penalty is low (λ = .1). A value of λ = .3 leads to behaviors about halfway between those seen with the higher and lower penalty values. Likewise, the lower the value of λ, the easier it is for the judge to distinguish bargaining sequences generated by the superhuman bot, as reported in the last row³ of Table 2.

We report in Figure 6 the distributions of four key behavioral indicators: counteroffer amounts, timings of counteroffers, acceptance offer amounts, and timings of deal acceptance. We plot those distributions against the baseline of human behavior observed by Camerer, Nave, and Smith (2019). The distributions of the primitive bot (left) are remarkably similar to those reported by Camerer, Nave, and Smith, confirming that our modeling framework replicates bargaining behaviors observed in their sample well. The efficient bot (right), in contrast, displays behaviors markedly dissimilar to the human benchmark. The superhuman bot (center), with λ = .3, while still outperforming the primitive bot (H₇) does not depart from the baseline as much as the efficient bot does (H₈).

Next, we test all five bot versions against actual human players in a controlled experiment setting.

Experiment

Experimental Design

In this controlled experiment, we want to measure the impact of algorithmic and superficial anthropomorphization on both objective bargaining outcomes and subjective bargaining evaluations. Algorithmic anthropomorphization is manipulated at five levels, each corresponding to one of the bot versions: primitive, superhuman (λ = .5, λ = .3, and λ = .1), and efficient. We test the effect of superficial anthropomorphization by manipulating whether the AI identity is disclosed (with a robot avatar and a “BargainBot” name) or undisclosed (with a human avatar and a human name; see, e.g., Luo et al. 2019). The opponent's avatar and name are displayed to participants on the screen throughout the experiment.

For completeness, in addition to the 5 × 2 experimental design, we also add a control condition in which humans bargain against humans.

Within each condition, participants were recruited from Prolific Academic and asked to play 20 rounds of a bargaining game and then respond to a questionnaire. For all 20 rounds, they played the role of the uninformed player, whereas their opponent (human or bot) played the role of the informed player.

At the beginning of the experiment, participants were asked to select a human avatar (out of 20 available) and enter their first name. They were then shown a two-minute tutorial video (with subtitles), which they could not skip and could replay at will. Then, they were directed to an online waiting room, where they were paired against a randomly assigned opponent. This opponent could be another human player (also recruited from Prolific) or a bot from one of the 5 × 2 main experimental conditions. If a player was paired with a bot, we replicated the waiting room experience as though they were paired with a human (i.e., half of the participants were paired immediately as if their opponent was already waiting, while the other half were asked to wait a random length of time while waiting for their opponent to “arrive”).

From a technical point of view, the AI models were deployed on the same GPU-optimized server as the one where the experiment took place, using a flask web framework, hence ensuring minimal latency (less than 1/50 second on average) between any change in the game status and the AI's response.

As discussed previously, the design of each game was identical to that of Camerer, Nave, and Smith (2019), with the exception that participants only played 20 rounds, instead of the 100 or more rounds played in Camerer, Nave, and Smith. Since our respondents are more representative of the general population—unlike economics students in a university lab—they could not be expected to play the same game for an extended period without losing interest or dropping out.

Once the 20 rounds concluded, respondents were invited to complete an online questionnaire, including subjective evaluations of their bargaining experience using the SVI (Curhan, Elfenbein, and Xu 2006). The SVI scale comprises 16 items categorized into four underlying subconstructs: feelings about the outcome, the self, the process, and the relationship. The questionnaire also included a Turing test administered using a single item: “Regardless of what you have been told, your counterpart in the negotiations may have been either an artificial-intelligence bargaining bot or an actual human being. Do you think you have negotiated with a human partner or a bargaining bot?” (1 = “Definitely a bot,” and 7 = “Definitely a human player”). The detailed questionnaire is provided in Web Appendix A.

The experiment was incentive-aligned. In addition to a show-up fee, participants were paid $.02 for every experimental dollar they earned. We consider the first five games for each participant to be a learning period and exclude them from our analysis. We collected data for 1,019 uninformed players, resulting in 15,285 games.

Results

Analyses

We report the descriptive statistics of all the variables of interest (objective outcomes and subjective evaluations) in Table 3 (superficial anthropomorphization with human avatar and name) and Table 4 (full identity disclosure with a bot avatar and “BargainBot” name). In addition, in Table 5, we use regression analyses to analyze the extent to which uninformed players realize they have bargained against a human player or a bot (i.e., the Turing test) and how they subjectively judged their negotiation experiences (i.e., SVI; Curhan, Elfenbein, and Xu 2006). We expect the Turing test and the SVI to be influenced by our experimental manipulations, namely, the kind of bot they bargained against (primitive, superhuman, or efficient) and whether or not the identity of the bot was disclosed. For completeness, we include the informed and uninformed players’ earnings as control variables. For the subjective evaluations, we also include the results of the Turing test as an additional control variable. In those regressions, the human-versus-human condition is the control condition, captured by the intercept, and the parameter values indicate to what extent the results are impacted by our manipulations and control variables.

Table 3.
Descriptive Statistics (Experiment on Prolific) for the Objective Outcomes and Subjective Evaluations When the Bots (Playing the Informed Players) Are Portrayed with a Human Avatar and Name.

← Fully Human Fully Efficient →

Human vs. Human primitive vs. Human superhuman vs. Human superhuman vs. Human superhuman vs. Human efficient vs. Human

(This Article) (λ = .5) (λ = .3) (λ = .1)

N = 92 N = 80 N = 98 N = 90 N = 93 N = 106

Objective Outcomes

Acceptance rate

Overall .74 .58 .56 .67 .58 .67

Pies $1–$4 .73 .53 .52 .59 .48 .63

Pies $5 and $6 .78 .73 .68 .94 .84 .78

Average payoffs—including strikes

Informed .33 (.15) .36 (.15) .29 (.19) .39 (.15) .30 (.16) .32 (.16)

Uninformed .41 (.14) .22 (.07) .26 (.11) .28 (.07) .28 (.11) .34 (.14)

Average payoffs—if agreement

Informed .44 (.60) .62 (.28) .53 (.70) .59 (.37) .51 (.59) .48 (.61)

Uninformed .56 (.60) .38 (.28) .47 (.70) .41 (.37) .49 (.59) .52 (.61)

Average number of counteroffers

Informed .54 (.62) 1.75 (.59) 1.59 (.54) 1.62 (.66) 2.10 (.45) 4.33 (.79)

Uninformed 1.22 (.63) 1.18 (.69) 1.26 (.66) 1.07 (.57) .88 (.79) 1.68 (1.38)

Average time to acceptance 4.74 (4.27) 6.00 (3.80) 6.05 (4.19) 5.96 (6.28) 6.90 (3.78) 6.27 (4.89)

Percentage of agreements

0 to 6 seconds 63.7 39.3 38.5 44.4 29.6 43.6

6 to 8.5 seconds 12.2 19.4 17.7 23.2 29.3 25.3

8.5 to 10 seconds 5.1 23.2 30.7 23.4 24.2 30.4

Percentage of fair splits

Overall 62.00 24.18 11.44 24.43 20.48 20.40

Pies $1–$4 64.49 30.93 14.08 30.72 24.63 22.40

Pies $5 and $6 55.18 8.97 5.36 11.72 13.77 15.21

Subjective Evaluations

Turing test 3.40 (1.46) 3.61 (1.79) 3.98 (1.82) 4.11 (1.71) 3.87 (1.80) 2.93 (1.61)

Feelings about …

the outcome 5.18 (1.43) 3.83 (1.30) 3.91 (1.51) 4.38 (1.21) 4.61 (1.23) 4.42 (1.34)

the self 4.81 (.88) 4.58 (.94) 4.38 (.89) 4.75 (.69) 4.83 (.84) 4.59 (.79)

the process 4.90 (1.60) 3.43 (1.50) 3.38 (1.60) 3.88 (1.41) 4.41 (1.44) 4.00 (1.45)

the relationship 4.95 (1.70) 3.21 (1.59) 3.20 (1.71) 3.70 (1.63) 4.33 (1.49) 3.90 (1.56)

Notes: Standard deviations are in parentheses, not reported for binary outcomes.

Table 4.
Descriptive Statistics (Experiment on Prolific) for the Objective Outcomes and Subjective Evaluations When the Bots (Playing the Informed Players) Are Portrayed with a Bot Avatar and the Name “BargainBot.”

← Fully Human Fully Efficient →

Human vs. Human (This Article) primitive vs. Human superhuman vs. Human (λ = .5) superhuman vs. Human (λ = .3) superhuman vs. Human (λ = .1) efficient vs. Human

N = 92 N = 90 N = 91 N = 96 N = 93 N = 90

Objective Outcomes

Acceptance rate

Overall .74 .60 .63 .68 .59 .76

Pies $1–$4 .73 .56 .61 .61 .51 .73

Pies $5 and $6 .78 .74 .70 .89 .83 .86

Average payoffs—including strikes

Informed .33 (.15) .36 (.13) .34 (.20) .40 (.17) .32 (.20) .38 (.19)

Uninformed .41 (.14) .24 (.08) .29 (.13) .28 (.07) .27 (.12) .38 (.14)

Average payoffs—if agreement

Informed .44 (.60) .60 (.34) .54 (.73) .59 (.40) .55 (.76) .50 (.72)

Uninformed .56 (.60) .40 (.34) .46 (.73) .41 (.40) .45 (.76) .50 (.72)

Average number of counteroffers

Informed .54 (.62) 1.78 (.49) 1.45 (.53) 1.48 (.74) 2.12 (.46) 4.51 (1.18)

Uninformed 1.22 (.63) 1.51 (.81) 1.46 (.78) 1.19 (.81) 1.47 (1.53) 2.58 (2.00)

Average time to acceptance 4.74 (4.27) 6.52 (3.85) 5.97 (4.57) 5.51 (5.95) 6.58 (4.35) 5.95 (3.61)

Percentage of agreements

0 to 6 seconds 63.7 35.3 43.8 51.3 34.8 57.4

6 to 8.5 seconds 12.2 25.9 19.7 17.3 25.6 22.2

8.5 to 10 seconds 5.1 25.7 26.6 21.5 20.7 19.8

Percentage of fair splits

Overall 62.00 25.19 11.20 21.69 17.02 19.05

Pies $1–$4 64.49 32.06 13.65 25.26 19.32 18.64

Pies $5 and $6 55.18 8.78 3.95 14.97 12.98 20.05

Subjective Evaluations

Turing test 3.40 (1.46) 2.41 (1.49) 2.26 (1.46) 2.27 (1.44) 2.47 (1.56) 1.96 (1.12)

Feelings about …

the outcome 5.18 (1.43) 3.94 (1.35) 4.00 (1.37) 4.50 (1.39) 4.37 (1.30) 4.73 (1.38)

the self 4.81 (.88) 4.50 (.82) 4.45 (.79) 4.64 (.93) 4.61 (.84) 4.76 (.80)

the process 4.90 (1.60) 3.58 (1.36) 3.38 (1.58) 3.91 (1.57) 4.02 (1.44) 4.37 (1.36)

the relationship 4.95 (1.70) 3.46 (1.49) 3.35 (1.71) 3.81 (1.55) 3.90 (1.57) 4.24 (1.42)

Notes: Although human players are never portrayed with a bot avatar, human-versus-human games are reproduced here for reference. Standard deviations are in parentheses, not reported for binary outcomes.

Table 5.
Results of Regression Analyses.

Turing Test Feelings About the Outcome Feelings About the Self Feelings About the Process Feelings About the Relationship

Intercept 3.682* 3.611* 4.124* 3.039* 3.079*

(.267) (.24) (.152) (.262) (.282)

primitive bot .141 −.940* −.050 −1.020* −1.270*

(.227) (.187) (.119) (.205) (.220)

superhuman bot (λ = .5) .259 −.850* −.170 −1.120* −1.310*

(.222) (.184) (.117) (.200) (.216)

superhuman bot (λ = .3) .334 −.600* .021 −.890* −1.100*

(.218) (.180) (.114) (.197) (.212)

superhuman bot (λ = .1) .359 −.430* .073 −.400* −.590

(.216) (.179) (.113) (.195) (.210)

efficient bot −.330 −.490 −.030 −.570 −.750*

(.208) (.172) (.109) (.187) (.202)

Earnings (self) −.020* .045* .023* .046* .046*

(.009) (.007) (.005) (.008) (.009)

Earnings (other) .008 .022* .004 .031* .027*

(.005) (.004) (.003) (.005) (.005)

Robot avatar −1.410* .078 −.010 .059 .149

(.104) (.094) (.059) (.102) (.110)

Turing test — .042 .028 .067* .093**

— (.026) (.016) (.028) (.031)

R² .184 .136 .051 .159 .148

Adjusted R² .177 .128 .043 .151 .141

p < .05.

p < .01.

p < .001.

Notes: Subjective evaluations (column headings) are regressed against experimental manipulations and control variables. We observe no learning trends, and results remain stable across rounds. Estimates with p-values less than .05 are highlighted in bold. Standard errors are in parentheses.

Regarding the SVI, we run a confirmatory factor analysis (see Web Appendix D) that shows that items corresponding to each subgroup have significant loadings on the latent constructs they represent. In line with Curhan, Elfenbein, and Xu (2006), we define each construct as an average of the items that load onto it, per the authors’ recommendations.

Algorithm aversion and anthropomorphization (H₁–H₃)

Identity disclosure has a strong negative effect (Table 5; β = −1.410, p < .001) on the Turing test. Superficial anthropomorphism (i.e., not revealing the AI's identity) significantly impacts the players’ perceptions that they are, indeed, playing against a fellow human being. While this effect is fairly obvious, it implies that superficial anthropomorphization effectively hides the identity of AI bots in bargaining, at least partly, which strongly supports H₂.

In terms of subjective evaluations, feelings about the self are the only component of the SVI where only the uninformed (self) player’s earnings have a significant impact on the regression analysis (β = .023, p < .001). It is fairly intuitive; the more one earns, the more one feels good about one's abilities and, eventually, oneself. The three other components of the SVI (feelings about the outcome, process, and relationship) display similar patterns. First, uninformed and informed player earnings positively influence subjective evaluations, although self-earnings have a greater impact than other-earnings. Second, all bots negatively affect subjective evaluations, offering strong support for H₁. Surprisingly, however, superficial anthropomorphization (i.e., robot avatar) has no significant effect in any of the regression analyses. Neither feelings about the outcome, the self, the process, or the relationship are affected by whether bargainers are told they are bargaining against a bot or a human being. This is interesting because, in the literature, superficial anthropomorphization has been shown to be effective in improving subjective evaluations and reducing algorithm aversion. Yet, we find no evidence of algorithmic aversion in this bargaining context. In other words, H₁ is confirmed, but not for the reasons we anticipated (more on this subsequently).

Consequently, because negotiators do not seem to care whether they are told they are bargaining against another human being or an AI, portraying an AI as a human being using superficial anthropomorphization does not improve anything (thus, H₃ is rejected). Superficial anthropomorphization can help portray bots as humans, but ultimately, this does not change how human beings judge their interactions with the bot. Algorithm aversion has been demonstrated many times over in collaborative environments. The fact that we fail to replicate it in a bargaining (i.e., partly adversarial) environment is noteworthy.

Interestingly, while being told they are bargaining against a human being is inconsequential for participants (contrary to predictions from the algorithmic aversion literature), believing they are bargaining against a fellow human is not. The Turing test influences feelings about the outcome (β = .042, p < .1), the process (β = .067, p < .05), and the relationship (β = .093, p < .001).

primitive bot (H₄)

In the simulations, we showed that the primitive bot's behaviors were indistinguishable from those observed by Camerer, Nave, and Smith (2019). Likewise, in the experiment, because the primitive bot was trained with the sole objective of behaving like humans, it does so, and as far as the Turing test is concerned, we do not observe any statistically significant differences between the human baseline and the primitive bot, offering support for H₄: The primitive bot behaves in a way that is indistinguishable from human behaviors.

While being judged “humanlike” in its behaviors, the primitive bot follows a strategy different from that of human participants in our sample. The primitive bot commands a far larger share of the pie: In case of agreement, the informed player (bot) earns an average of 62% (human avatar) or 60% (bot avatar) of the pie versus 44% with the Prolific sample (p < .001). Due to the greedy demands of the primitive bot, however, fewer agreements are reached, and, consequently, the primitive bot fares no better than its human counterparts in human-versus-human games (payoffs averaged per game and scaled to pie size: .36 for the primitive bot vs. .33 for human informed players, n.s.). The informed player does not benefit from asking for a larger share of the pie since fewer agreements compensate for any increase in absolute payoffs. However, the uninformed (human) player pays a dire price: This player’s average payoff over 15 games is reduced by about 40% (from .41 to .22/.24; here and subsequently, the xx/yy figures refer to the human/bot avatar conditions). Since the primitive bot was successfully calibrated to replicate human behaviors in its training examples, these differences can only be explained by differences in sample compositions: Human players in our sample are more representative of the general population, whereas human players in Camerer, Nave, and Smith (2019) are young economics students from higher education institutions (more on this subsequently).

efficient bot (H₅ and H₆)

In the simulations, the efficient bot achieved better objective outcomes than the primitive bot, both in terms of informed player earnings (.46 vs. .28), uninformed player earnings (.31 vs. .23), and agreements (77% vs. 51%). Interestingly, the promises of the simulations do not fully materialize in real life.

While the efficient bot outperforms the primitive bot in terms of agreements (67%/76% vs. 58%/60%, both p < .01) and helps the uninformed player achieve superior payoffs (.34/.38 vs. .22/.24, both p < .001), it does not improve the most critical metric, namely the bot's own financial performance (.32/.38 vs. .36/.36, p = .216 and p = .540, respectively). Therefore, we must reject H₅. We will investigate the reasons for this unexpected result shortly.

In terms of anthropomorphism, the efficient bot adopts strategies atypical of human beings, such as making counteroffers in rapid succession or responding to proposals in a fraction of a second. When bots are portrayed with a human avatar, participants spot the trickery more easily with the efficient bot than with the primitive bot (Turing test: 2.93 vs. 3.61, p = .007). Even when they are informed that their negotiator counterpart is a bot, the difference in subjective evaluations remains significant (1.96 vs. 2.41, p = .022). These results hold even after controlling for rewards and avatars (see Table 5; β = −.330 vs. β = .141, p < .01), strongly supporting H₆.

superhuman bot (H₇ and H₈)

For clarity, we will focus on the superhuman bot with λ = .3, as the other versions resemble, to a greater extent, the primitive (when λ = .5) and the efficient (when λ = .1) bots we have already discussed.

In the simulations, the superhuman bot (λ = .3) managed to increase the ratio of agreements from 51% to 58% compared with the primitive bot. This increased agreement rate benefited the uninformed player (average payoff improved from .23 to .25). The payoff of the informed player, played by the superhuman bot, was significantly improved as well (from .28 to .33).

The same patterns occur in the experiment. Compared with the primitive bot, the more “rational” behaviors of the superhuman bot improved the agreement rate (from 58%/60% to 67%/68%, both p < .01), the uninformed player earnings (from .22/.24 to .28/.28, both p < .001), and, more critically, the informed player earnings (from .36/.36 to .39/.40, p = .027 and p = .003). Thus, H₇ is strongly supported.

While improving all performance metrics, the superhuman bot does so while appearing very humanlike. Compared with the efficient bot, the differences in the Turing tests are strongly significant in the human avatar condition (4.11 vs. 2.93, p < .001) and marginally so in the robot avatar condition (2.26 vs. 1.96, p = .096). After controlling for payoffs and avatars (see Table 5), the difference is strongly significant (β = .334 vs. β = −.330, p < .01), supporting H₈.

Astoundingly, the superhuman bot in the human avatar condition is perceived to be more humanlike than actual human players (4.11 vs. 3.40, p < .001). This finding echoes recent research that reported that AI-generated poetry was rated “more human” than human-written poetry and was rated more favorably (Porter and Machery 2024).

Understanding the efficient bot’s underperformance

On the one hand, the simulations predicted a highly performing efficient bot, but these promises did not fully materialize in the field study. On the other hand, the superhuman bot achieved remarkable results, which were in line with the simulations, so much so that, in terms of bot financial performance, it even outperformed the efficient bot (average payoffs of .39/.40 vs. .32/.38, p < .001 and p = .02). How come?

The efficient bot's strategy is highly mechanistic and, to the astute eye, predictable. It tries to obtain the highest share of the pie for the first seconds, but as the game approaches the critical 8.5-second mark (after which any agreement will be ignored), it tends to accept any offer on the table as long as it generates some payoff. From an economic perspective, it is indeed optimal since any payoff is better than none. However, once the pattern has been identified, it is highly exploitable by the other party.

The superhuman bot is far less predictable. For instance, it would randomly open the bidding by offering nothing and wait for the uninformed player to make counteroffers until one is deemed acceptable. If the uninformed player is not responsive, the superhuman bot may make an offer toward the end of the game—or it may not, and let the clock run out, mimicking “irrational” human behaviors. In other words, the superhuman bot has discovered that “in negotiation, the first one to talk loses” (a popular adage in negotiation literature). To the uninformed player, this tactic appears very humanlike—that is, if the human negotiator is particularly ruthless. It is also extremely hard to counter. The authors have negotiated countless times against the superhuman bot and, despite our intricate knowledge of the algorithm, have been “taken to the cleaners” far more often than we wish to admit.

In Table 6, we report the results of a regression analysis where we explain how often a bot is “exploited” by the uninformed player, that is, how often the agreement gives more than half of the pie to the latter—who is, in theory, at a marked disadvantage. It appears that the more “economically rational” a bot is, the more often it is exploited. The parameter goes from β = −.904 (p < .001) for the primitive bot to β = 2.444 (p < .001) for the efficient bot. Interestingly, when human players are informed that they are, indeed, negotiating against a bot, this effect is amplified (β = .405, p < .01).

Table 6.
Regression Analysis.

Bot Exploitation

Intercept 2.130*

(.208)

primitive bot −.904*

(.267)

superhuman bot (λ = .5) −.521*

(.261)

superhuman bot (λ = .3) −.624*

(.263)

superhuman bot (λ = .1) .253

(.262)

efficient bot 2.444*

(.259)

Robot avatar .405

(.131)

R² .266

Adjusted R² .261

p < .05.

p < .01.

p < .001.

Notes: The dependent variable is the number of rounds (out of 15) where the informed player (played by a bot) conceded more than half of the pie to the uninformed player (played by a human being). Estimates with p-values less than .05 are highlighted in bold. Standard errors are in parentheses.

In theory, the unpredictability and irrationality of humanlike behaviors should make them less profitable, and our simulations predicted that the efficient bot would surpass all others. In practice, however, the predictability of economically efficient tactics makes them highly exploitable, up to a point where the uninformed players manage to negate their structural disadvantage and make as much money as the efficient bot (.34/.38 vs. .32/.38).

We summarize all the key results of this research in Table 7.

Table 7.
Summary Results of the Seven Hypotheses, as Assessed in Our Simulation Study and Controlled Experiment.

Hypothesis Simulations Experiment Comments

H₁ Everything else being equal, a participant gives lower subjective evaluations when they negotiate against an AI than when they negotiate against another human negotiator. N.A. ✓ Although the results match predictions from the algorithm aversion literature, the underlying mechanism is not based on whether participants are told they are negotiating with an AI (as predicted by this literature), but if they are. We conjecture that this effect is caused by the “uncanny valley” phenomenon (see “Conclusions”).

H₂ Superficial anthropomorphization contributes to portraying an AI negotiator as human. N.A. ✓

H₃ Portraying an AI negotiator as human through superficial anthropomorphization improves subjective evaluations. N.A. ✗ Superficial anthropomorphization contributes to portraying an AI as a human being (H₂). However, portraying an AI as human does not affect subjective evaluations. Hence, contrary to predictions, superficial anthropomorphization does not improve subjective evaluations.

H₄ The primitive bot behaves in a way that is indistinguishable from humans. ✓ ✓

H₅ The efficient bot achieves better objective outcomes than the primitive bot. ✓ ✗ The simulations predicted a highly performing efficient bot, but these promises did not fully materialize in the field study. Because the efficient bot is highly “rational” and predictable, we show that an astute human negotiator can exploit it.

H₆ The efficient bot is perceived as less humanlike than the primitive bot. ✓ ✓

H₇ The superhuman bot achieves better objective outcomes than the primitive bot. ✓ ✓ In the experiment, not only does the superhuman bot (λ = .3) outperform the primitive bot, as predicted, but it also outperforms the efficient bot (see H₅) and other human negotiators.

H₈ The superhuman bot is perceived as more humanlike than the efficient bot. ✓ ✓ In the experiment, not only does the superhuman bot (λ = .3) appear more humanlike than the efficient bot, as predicted, but it is also rated as more humanlike than actual human participants.

Notes: N.A. = not applicable.

Conclusions

Summary

Companies are increasingly using AI to automate their negotiation processes. While there are considerable benefits for firms to rely on AI (e.g., automation, cost-cutting, consistency), little attention has been paid to the psychological and relational impact of negotiating against AI. We posited that using AI bots may lead to unfavorable subjective evaluations, which have been shown to impact the outcomes and profitability of future negotiations. Hence, the use of AI in bargaining may be beneficial in the short term but detrimental in the long term, and no research to date has addressed that question.

On the one hand, many solutions have been proposed to assuage algorithm aversion (e.g., explaining the AI reasoning or giving control over the AI's output), but they have been suggested in contexts where humans’ and AI's objectives are perfectly aligned. They do not transpose easily to an adversarial context such as bargaining.

On the other hand, research has shown that superficial anthropomorphization (e.g., portraying the AI as being human in its appearance) could help foster trust and familiarity and, hence, may possibly prove beneficial in a bargaining context. Still, we posit that it might not be sufficient. We propose the concept of algorithmic anthropomorphization, namely, a novel GAN framework to train an AI to improve on its quantitative objectives while appearing humanlike in doing so, and test three versions of a superhuman bot experimentally. We compare it with two alternatives: a bot that replicates human behaviors without trying to improve the bargaining outcomes (the primitive bot) and one that focuses on improving the bargaining outcomes without the constraints of trying to appear humanlike at all (the efficient bot).

During our initial exploration of the phenomenon, and in the specific context of a low-stakes, sequential bargaining game, we show that bargaining against a bot, indeed, negatively affects participants’ subjective evaluations of the negotiations’ outcome, process, and relationship, even after controlling for the objective outcomes of the game. More importantly—and maybe more surprisingly—this effect also holds after controlling for identity disclosure (i.e., superficial anthropomorphism) and the results of the Turing test. In other words, it does not matter whether participants are told or believe they are bargaining against an AI. The fact that they are bargaining against an AI is sufficient to affect subjective evaluations negatively, which may affect their willingness to engage in future negotiations with the firm.

The superhuman bot (λ = .3) achieves perfect anthropomorphization to a point where it appears more humanlike to participants than actual human participants, yet outperforms human players by about 20%. Because of its unpredictability, the superhuman bot also outperforms the efficient bot, whose perfect rationality makes it too predictable—at least in our specific experimental setting—and hence exploitable.

This result is confirmed by anecdotal evidence. One participant who bargained against the superhuman bot wrote: “I personally enjoyed my bargain. Given that I bargained with a bot, it felt like the bot had a reasoning and we could reach a compromise at some points. If I’m being honest, I’d say it was a smart bot.” Another participant claimed, “I would just like to say how interesting and enjoyable the study was. I believe I was bargaining with a bot and I was almost energetically trying to judge where they would go within the bargaining to get a deal. I have experienced careers in the past which involved a lot of face-to-face negotiation and during this study I actually got some of those feelings that I used to have.”

Uncanny Valley?

Despite their superior anthropomorphism, the superhuman bots achieve lower subjective evaluations than human bargainers. If several bot versions are both economically efficient and undistinguishable from other human beings, how come? One conjecture is that bots may fall prey to the uncanny valley phenomenon (Mori, MacDorman, and Kageki 2012).

In our deep learning framework, the unit of analysis to train all bots (whether to replicate human behaviors or improve on them) is a single bargaining sequence. In other words, no bot is trained to consider (or even be aware of) the outcomes from previous rounds with the same player, as doing so would increase the model complexity (and training data requirements) by several orders of magnitude. If it were possible to circumvent this limitation, we would expect the performance of all models to improve significantly.

Still, human players do not start each bargaining game from a blank slate. Game after game, they form a mental model of their bargaining partners, build trust or distrust, and even may “invest” in the relationship, hoping to reap the benefits in later rounds. If human bargainers signal their intentions across games, and the bot blatantly ignores them, it might create an uneasy feeling of one-way communication and the absence of reciprocity. Of the four subconstructs constituting the SVI, all bots perform the worst on feelings about the relationship, which is consistent with our conjecture. Interestingly, human participants do not even need to perceive they are bargaining against a bot to feel uneasy.

If confirmed beyond the limited context of our experimental setting (i.e., small-stakes, sequential bargaining), this would be a significant warning call for firms considering replacing human negotiators with automated AI solutions. Our research suggests that even an AI that is economically efficient, hard to exploit, and perceived as more humanlike than actual human beings may still behave in a way that, unbeknownst to all, could hurt the relationship between parties (e.g., loss of reciprocity, lowered mutual understanding and commitment).

Sample Differences

The sample used to train our models (Camerer, Nave, and Smith 2019) exclusively consisted of young students (μ = 21.3, σ = 2.4) at premier educational institutions in the United States (Caltech and UCLA) and possibly primed with relevant game-theory coursework. They also played 135 games instead of 20, likely generating boredom and inattention. The incentive-alignment mechanism was based on a lottery rather than directly proportional to their economic performance. The large number of games they played may also have diluted the incentive alignment of the experiment and made the outcome of each game far less economically relevant. Comparatively, our sample is much more diverse, and the outcome of each game matters proportionally more. It is, therefore, not surprising that the behaviors of both samples diverge significantly⁴ (i.e., the student sample is more greedy and less likely to reach an agreement).

While such sample divergence might be considered an issue, it is typical of AI development challenges that companies face in real life. It is quite common for data used to calibrate an AI model not to be perfectly representative of the circumstances in which the model will be used. For instance, research has shown that state-of-the-art object detection systems are more likely to correctly identify pedestrians when they are light-skinned than when they are dark-skinned (Wilson, Hoffman, and Morgenstern 2019), leading to a flurry of articles claiming that self-driving cars relying on these systems were “racist” and might pose a greater danger to dark-skinned pedestrians than to light-skinned ones (e.g., Makoni 2022). The original study's authors correctly identified that the source of the problem was the underrepresentation of dark-skinned pedestrians in the training data (barely 22%) and that it could be corrected by reweighting the sample. In the same vein, digital assistants recognize white American accents better than any other (Rangarajan 2021) and thus are not well adapted to many real-life usage situations, most likely for similar reasons. Several researchers have highlighted the risks of such biases, such as creating cultural barriers or safety risks, or violating cultural values (Prabhakaran, Qadri, and Hutchinson 2022). In data labeling, a fastidious and time-consuming but essential step in many AI applications, Google and Microsoft largely rely on human coders located in Africa due to their cheap labor force (Hale 2019), with underinvestigated consequences (e.g., would a customer email be equally labeled “angry” by an African, Asian, or American coder?).

The fewer differences between the training data and the actual context in which an AI model will be deployed, the better the latter should perform. Our models likely suffer from a mismatch between the students in Camerer, Nave, and Smith (2019) and the Prolific respondents in our study. Still, it is reassuring to observe that the bots could develop excellent bargaining strategies from the Camerer, Nave, and Smith sample and that such strategies remained effective against a markedly different population. For instance, the informed superhuman bot (λ = .3) achieved payoffs 18% (human avatar) and 21% (bot avatar) greater than those of their Prolific counterparts. It is likely that the student sample, while greedier and less inclined to reach a deal, contained enough varied cases to allow an AI to learn alternative strategies successfully.

Key Takeaways

Our research is a preliminary exploration of the impact of using AI on the subjective evaluations of negotiations. While the context (a single, small-stakes, unstructured bargaining game) might limit the generalizability of our results, we highlight three takeaways: methodological, substantive, and managerial.

Methodological

We build and test a novel GAN framework that simultaneously pursues objective payoffs and anthropomorphization. In settings of strategic interactions, work has been done on developing AI that finds optimal solutions (Lewis et al. 2017) or that is anthropomorphized in its behavior (Jacob et al. 2022; McIlroy-Young et al. 2022). To the best of our knowledge, however, no work has been done on building AI that can find optimal solutions while being rewarded to behave like humans in the process. In the specific context of a low-stakes, unstructured bargaining game, our preliminary results show that the resulting superhuman bot achieves superior economic performance while being indistinguishable from human beings.

Substantive

We contribute to the human–AI interaction literature by highlighting both the importance and shortcomings of the anthropomorphization of strategies (rather than appearances). While superficial anthropomorphization contributes to portraying a bot as human, algorithmic anthropomorphization is equally crucial to pass the Turing test since the bot strategy may reveal its nonhuman nature. In stark contrast with predictions from the algorithm aversion literature, however, in our small-stakes bargaining game, neither form of anthropomorphization alleviates the detrimental effects of AI on subjective evaluations. While our empirical focus is on a specific bargaining context, we speculate that outcome differences across bargaining situations—especially those involving AI—are primarily driven by negotiators’ perceptions of fairness, predictability, mutual understanding, and exploitation. While AI negotiators’ ability to replicate humanlike unpredictability seems to help, it still fails to manage relational signals over multiple interactions: The AI negotiator does not signal its intentions or dispositions, and is oblivious to the signals it receives from its human counterpart. This inability likely moderates the differences we observe across conditions, influencing both economic outcomes and subjective evaluations. This theoretical perspective, which emphasizes the interplay between algorithmic predictability and perceived relational intent, offers valuable directions for future research in improving human–AI interactions.

Managerial

We urge companies that rely on AI negotiators to consider going beyond short-term objective outcomes to measure success. Although our initial exploration of the phenomenon relies on a simple, low-stakes bargaining game, our preliminary results suggest that AI negotiators may hurt subjective evaluations and, consequently, long-term relationships, even when the AI is objectively good and indistinguishable from human beings.

Future Research

Beyond this research, bargaining with AI still poses many questions that have yet to be answered. How do human beings adapt their behavior to AI bots? Are there certain traits that make humans better (or worse) bargainers against AI bots? Can we cross the uncanny valley by building an AI model that goes beyond optimizing one bargaining game at a time but also models the long-term relationship with the other party across bargaining rounds?

Based on discussions with managers and researchers at Pactum AI, the industry also seems to be concerned with how AI (1) could better account for cultural differences, (2) anticipate the other party's priorities as quickly as possible to better “calibrate” the negotiations, and (3) adapt one's strategy on the spot based on subtle signals sent by the other party—qualities that professional human negotiators excel at compared with AI systems. Recent literature also mentions that (4) “AI-powered negotiation agents are likely to develop biases and create unfair deals or unethical interactions, especially when trained or given rules that make them purely utilitarian” (Falcão Filho 2024a, p. 53). Interestingly, including a penalty for being humanlike in the response function partly mitigates this latter concern.

With the rapid diffusion of AI in society, we can expect strategic interactions with AI to be increasingly normalized, and expanding our understanding of the psychological aspect of this phenomenon becomes imperative. Our work aims to build knowledge in this nascent area.

Supplemental Material

sj-pdf-1-mrj-10.1177_00222437251375323 - Supplemental material for Bots Bargaining with Humans: Building AI Super-Bargainers with Algorithmic Anthropomorphization

Supplemental material, sj-pdf-1-mrj-10.1177_00222437251375323 for Bots Bargaining with Humans: Building AI Super-Bargainers with Algorithmic Anthropomorphization by Sumon Chaudhuri and Arnaud De Bruyn in Journal of Marketing Research

Feature Name	Description
Duration	The amount of time elapsed since the previous action.
Game start	A dummy variable (0 or 1) that indicates the first step in the sequence.
Game end	A dummy variable (0 or 1) that indicates the last step in the sequence.
Self game state	Indicates whether the latest action of the player is to do nothing (0), make a counteroffer (1), or accept the other player's offer (2).
Self offer	The player's current offer.
Other game state	Indicates whether the latest action of the opponent is to do nothing (0), make a counteroffer (1), or accept the other player's offer (2).
Other offer	The other player's current offer.
Fairness	The extent to which the player's offer deviates from a fair split of the pie. For the uninformed player, this feature stores 0 as they do not have information about the pie size being bargained over, and hence will not know what a fair split is supposed to be in that game. For the informed player, this feature stores the value (self offer − (pie/2))/(pie/2).
Pie size	Stores pie size for the informed player. The value is set to 0 for the uninformed player.

	← Fully Human					Fully Efficient →
Acceptance rate
Overall	.50	.51	.43	.58	.60	.77
Pies $1–$4	.42	.41	.35	.46	.49	.71
Pies $5 and $6	.66	.70	.59	.81	.82	.90
Average payoffs—including strikes
Informed	.28 (.31)	.28 (.32)	.26 (.33)	.33 (.35)	.37 (.36)	.46 (.34)
Uninformed	.22 (.27)	.23 (.27)	.17 (.25)	.25 (.29)	.23 (.28)	.31 (.28)
Average payoffs—if agreement
Informed	.55 (.20)	.55 (.22)	.60 (.23)	.57 (.26)	.61 (.26)	.59 (.26)
Uninformed	.45 (.20)	.45 (.22)	.40 (.23)	.43 (.26)	.39 (.26)	.41 (.26)
Average number of counteroffers
Informed	2.22 (1.67)	2.16 (1.46)	2.42 (1.79)	2.19 (1.23)	1.61 (.82)	5.24 (2.89)
Uninformed	2.00 (1.03)	1.97 (.95)	2.06 (.96)	2.10 (.92)	2.12 (.93)	2.06 (.75)
Average time to acceptance	7.54 (2.27)	7.34 (2.20)	7.49 (2.16)	7.26 (1.87)	7.13 (2.00)	6.54 (1.87)
Percentage of agreements
0 to 6 seconds	18.27	18.50	15.08	27.80	31.86	53.20
6 to 8.5 seconds	32.31	29.04	26.93	29.15	31.05	35.09
8.5 to 10 seconds	21.52	20.93	19.76	15.42	18.21	7.74
Percentage of fair splits
Overall	33.78	24.34	13.77	17.98	15.39	14.37
Pies $1–$4	44.18	32.33	16.95	23.40	19.20	15.97
Pies $5 and $6	20.90	14.88	9.86	11.98	10.85	11.92
Bot detection	N.A.	.0%	31.0%	59.6%	69.4%	100.0%

	← Fully Human					Fully Efficient →
Objective Outcomes
Acceptance rate
Overall	.74	.58	.56	.67	.58	.67
Pies $1–$4	.73	.53	.52	.59	.48	.63
Pies $5 and $6	.78	.73	.68	.94	.84	.78
Average payoffs—including strikes
Informed	.33 (.15)	.36 (.15)	.29 (.19)	.39 (.15)	.30 (.16)	.32 (.16)
Uninformed	.41 (.14)	.22 (.07)	.26 (.11)	.28 (.07)	.28 (.11)	.34 (.14)
Average payoffs—if agreement
Informed	.44 (.60)	.62 (.28)	.53 (.70)	.59 (.37)	.51 (.59)	.48 (.61)
Uninformed	.56 (.60)	.38 (.28)	.47 (.70)	.41 (.37)	.49 (.59)	.52 (.61)
Average number of counteroffers
Informed	.54 (.62)	1.75 (.59)	1.59 (.54)	1.62 (.66)	2.10 (.45)	4.33 (.79)
Uninformed	1.22 (.63)	1.18 (.69)	1.26 (.66)	1.07 (.57)	.88 (.79)	1.68 (1.38)
Average time to acceptance	4.74 (4.27)	6.00 (3.80)	6.05 (4.19)	5.96 (6.28)	6.90 (3.78)	6.27 (4.89)
Percentage of agreements
0 to 6 seconds	63.7	39.3	38.5	44.4	29.6	43.6
6 to 8.5 seconds	12.2	19.4	17.7	23.2	29.3	25.3
8.5 to 10 seconds	5.1	23.2	30.7	23.4	24.2	30.4
Percentage of fair splits
Overall	62.00	24.18	11.44	24.43	20.48	20.40
Pies $1–$4	64.49	30.93	14.08	30.72	24.63	22.40
Pies $5 and $6	55.18	8.97	5.36	11.72	13.77	15.21
Subjective Evaluations
Turing test	3.40 (1.46)	3.61 (1.79)	3.98 (1.82)	4.11 (1.71)	3.87 (1.80)	2.93 (1.61)
Feelings about …
the outcome	5.18 (1.43)	3.83 (1.30)	3.91 (1.51)	4.38 (1.21)	4.61 (1.23)	4.42 (1.34)
the self	4.81 (.88)	4.58 (.94)	4.38 (.89)	4.75 (.69)	4.83 (.84)	4.59 (.79)
the process	4.90 (1.60)	3.43 (1.50)	3.38 (1.60)	3.88 (1.41)	4.41 (1.44)	4.00 (1.45)
the relationship	4.95 (1.70)	3.21 (1.59)	3.20 (1.71)	3.70 (1.63)	4.33 (1.49)	3.90 (1.56)

	← Fully Human					Fully Efficient →
Objective Outcomes
Acceptance rate
Overall	.74	.60	.63	.68	.59	.76
Pies $1–$4	.73	.56	.61	.61	.51	.73
Pies $5 and $6	.78	.74	.70	.89	.83	.86
Average payoffs—including strikes
Informed	.33 (.15)	.36 (.13)	.34 (.20)	.40 (.17)	.32 (.20)	.38 (.19)
Uninformed	.41 (.14)	.24 (.08)	.29 (.13)	.28 (.07)	.27 (.12)	.38 (.14)
Average payoffs—if agreement
Informed	.44 (.60)	.60 (.34)	.54 (.73)	.59 (.40)	.55 (.76)	.50 (.72)
Uninformed	.56 (.60)	.40 (.34)	.46 (.73)	.41 (.40)	.45 (.76)	.50 (.72)
Average number of counteroffers
Informed	.54 (.62)	1.78 (.49)	1.45 (.53)	1.48 (.74)	2.12 (.46)	4.51 (1.18)
Uninformed	1.22 (.63)	1.51 (.81)	1.46 (.78)	1.19 (.81)	1.47 (1.53)	2.58 (2.00)
Average time to acceptance	4.74 (4.27)	6.52 (3.85)	5.97 (4.57)	5.51 (5.95)	6.58 (4.35)	5.95 (3.61)
Percentage of agreements
0 to 6 seconds	63.7	35.3	43.8	51.3	34.8	57.4
6 to 8.5 seconds	12.2	25.9	19.7	17.3	25.6	22.2
8.5 to 10 seconds	5.1	25.7	26.6	21.5	20.7	19.8
Percentage of fair splits
Overall	62.00	25.19	11.20	21.69	17.02	19.05
Pies $1–$4	64.49	32.06	13.65	25.26	19.32	18.64
Pies $5 and $6	55.18	8.78	3.95	14.97	12.98	20.05
Subjective Evaluations
Turing test	3.40 (1.46)	2.41 (1.49)	2.26 (1.46)	2.27 (1.44)	2.47 (1.56)	1.96 (1.12)
Feelings about …
the outcome	5.18 (1.43)	3.94 (1.35)	4.00 (1.37)	4.50 (1.39)	4.37 (1.30)	4.73 (1.38)
the self	4.81 (.88)	4.50 (.82)	4.45 (.79)	4.64 (.93)	4.61 (.84)	4.76 (.80)
the process	4.90 (1.60)	3.58 (1.36)	3.38 (1.58)	3.91 (1.57)	4.02 (1.44)	4.37 (1.36)
the relationship	4.95 (1.70)	3.46 (1.49)	3.35 (1.71)	3.81 (1.55)	3.90 (1.57)	4.24 (1.42)

	Turing Test	Feelings About the Outcome	Feelings About the Self	Feelings About the Process	Feelings About the Relationship
Intercept	3.682***	3.611***	4.124***	3.039***	3.079***
(.267)	(.24)	(.152)	(.262)	(.282)
primitive bot	.141	−.940***	−.050	−1.020***	−1.270***
(.227)	(.187)	(.119)	(.205)	(.220)
superhuman bot (λ = .5)	.259	−.850***	−.170	−1.120***	−1.310***
(.222)	(.184)	(.117)	(.200)	(.216)
superhuman bot (λ = .3)	.334	−.600***	.021	−.890***	−1.100***
(.218)	(.180)	(.114)	(.197)	(.212)
superhuman bot (λ = .1)	.359	−.430*	.073	−.400*	−.590**
(.216)	(.179)	(.113)	(.195)	(.210)
efficient bot	−.330	−.490**	−.030	−.570**	−.750***
(.208)	(.172)	(.109)	(.187)	(.202)
Earnings (self)	−.020*	.045***	.023***	.046***	.046***
(.009)	(.007)	(.005)	(.008)	(.009)
Earnings (other)	.008	.022***	.004	.031***	.027***
(.005)	(.004)	(.003)	(.005)	(.005)
Robot avatar	−1.410***	.078	−.010	.059	.149
(.104)	(.094)	(.059)	(.102)	(.110)
Turing test	—	.042	.028	.067*	.093**
—	(.026)	(.016)	(.028)	(.031)
R²	.184	.136	.051	.159	.148
Adjusted R²	.177	.128	.043	.151	.141

	Bot Exploitation
Intercept	2.130***
(.208)
primitive bot	−.904***
(.267)
superhuman bot (λ = .5)	−.521*
(.261)
superhuman bot (λ = .3)	−.624*
(.263)
superhuman bot (λ = .1)	.253
(.262)
efficient bot	2.444***
(.259)
Robot avatar	.405**
(.131)
R²	.266
Adjusted R²	.261

Hypothesis	Simulations	Experiment	Comments
H₁	Everything else being equal, a participant gives lower subjective evaluations when they negotiate against an AI than when they negotiate against another human negotiator.	N.A.	✓	Although the results match predictions from the algorithm aversion literature, the underlying mechanism is not based on whether participants are told they are negotiating with an AI (as predicted by this literature), but if they are. We conjecture that this effect is caused by the “uncanny valley” phenomenon (see “Conclusions”).
H₂	Superficial anthropomorphization contributes to portraying an AI negotiator as human.	N.A.	✓
H₃	Portraying an AI negotiator as human through superficial anthropomorphization improves subjective evaluations.	N.A.	✗	Superficial anthropomorphization contributes to portraying an AI as a human being (H₂). However, portraying an AI as human does not affect subjective evaluations. Hence, contrary to predictions, superficial anthropomorphization does not improve subjective evaluations.
H₄	The primitive bot behaves in a way that is indistinguishable from humans.	✓	✓
H₅	The efficient bot achieves better objective outcomes than the primitive bot.	✓	✗	The simulations predicted a highly performing efficient bot, but these promises did not fully materialize in the field study. Because the efficient bot is highly “rational” and predictable, we show that an astute human negotiator can exploit it.
H₆	The efficient bot is perceived as less humanlike than the primitive bot.	✓	✓
H₇	The superhuman bot achieves better objective outcomes than the primitive bot.	✓	✓	In the experiment, not only does the superhuman bot (λ = .3) outperform the primitive bot, as predicted, but it also outperforms the efficient bot (see H₅) and other human negotiators.
H₈	The superhuman bot is perceived as more humanlike than the efficient bot.	✓	✓	In the experiment, not only does the superhuman bot (λ = .3) appear more humanlike than the efficient bot, as predicted, but it is also rated as more humanlike than actual human participants.

Footnotes

Coeditor

Brett R. Gordon

Associate Editor

Eric T. Bradlow

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors acknowledge the financial support of the Research Center at the ESSEC Business School.

ORCID iDs

Sumon Chaudhuri

Arnaud De Bruyn

Notes

References

Aggarwal

McGill

A.L.

(2007), “Is That Car Smiling at Me? Schema Congruity as a Basis for Evaluating Anthropomorphized Products,” Journal of Consumer Research, 34 (4), 468–79.

Anand

Lee

(2023), “Using Deep Learning to Overcome Privacy and Scalability Issues in Customer Data Transfer,” Marketing Science, 42 (1), 189–207.

Artist (2024), “15 AI-Generated Artworks that Captivated the World,” (June 23), https://www.theartist.me/art/15-ai-generated-artworks-that-captivated-the-world/.

Becker

W.J.

Curhan

J.R.

(2018), “The Dark Side of Subjective Value in Sequential Negotiations: The Mediating Role of Pride and Anger,” Journal of Applied Psychology, 103 (1), 74–87.

Blader

S.L.

Chen

Y.R.

(2012), “Differentiating the Effects of Status and Power: A Justice Perspective,” Journal of Personality and Social Psychology, 102 (5), 994–1014.

Brown

(2023), “An AI Just Negotiated a Contract for the First Time Ever—and No Human Was Involved,” CNBC (November 7), https://www.cnbc.com/2023/11/07/ai-negotiates-legal-contract-without-humans-involved-for-first-time.html.

Camerer

C.F.

Nave

Smith

(2019), “Dynamic Unstructured Bargaining with Private Information: Theory, Experiment, and Outcome Prediction via Machine Learning,” Management Science, 65 (4), 1867–90.

Crolic

Thomaz

Hadi

Stephen

A.T.

(2022), “Blame the Bot: Anthropomorphism and Anger in Customer–Chatbot Interactions,” Journal of Marketing, 86 (1), 132–48.

Curhan

J.R.

Elfenbein

H.A.

Kilduff

G.J.

(2009), “Getting Off on the Right Foot: Subjective Value Versus Economic Value in Predicting Longitudinal Job Outcomes from Job Offer Negotiations,” Journal of Applied Psychology, 94 (2), 524–34.

10.

Curhan

J.R.

Elfenbein

H.A.

(2006), “What Do People Value When They Negotiate? Mapping the Domain of Subjective Value in Negotiation,” Journal of Personality and Social Psychology, 91 (3), 493–512.

11.

Dai

Singh

(2020), “Conspicuous by Its Absence: Diagnostic Expert Testing Under Uncertainty,” Marketing Science, 39 (3), 540–63.

12.

De Pauw

A.S.

Venter

Neethling

(2011), “The Effect of Negotiator Creativity on Negotiation Outcomes in a Bilateral Negotiation,” Creativity Research Journal, 23 (1), 42–50.

13.

De Visser

E.J.

Monfort

S.S.

McKendrick

Smith

M.A.B.

McKnight

P.E.

Krueger

Parasuraman

(2016), “Almost Human: Anthropomorphism Increases Trust Resilience in Cognitive Agents,” Journal of Experimental Psychology: Applied, 22 (3), 331–49.

14.

Dietvorst

B.J.

Simmons

J.P.

Massey

(2015), “Algorithm Aversion: People Erroneously Avoid Algorithms After Seeing Them Err,” Journal of Experimental Psychology: General, 144 (1), 114–26.

15.

Dietvorst

B.J.

Simmons

J.P.

Massey

(2018), “Overcoming Algorithm Aversion: People Will Use Imperfect Algorithms If They Can (Even Slightly) Modify Them,” Management Science, 64 (3), 1155–70.

16.

Falcão Filho

H.A.

(2024a), “Making Sense of Negotiation and AI: The Blossoming of a New Collaboration,” International Journal of Commerce and Contracting, 8 (1–2), 44–64.

17.

Falcão Filho

H.A.

(2024b), “The Power of AI to Shape Negotiations,” INSEAD Knowledge (December 10), https://knowledge.insead.edu/strategy/power-ai-shape-negotiations.

18.

Goodfellow

I.J.

Pouget-Abadie

Mirza

Warde-Farley

Ozair

Courville

Bengio

(2014), “Generative Adversarial Nets,” Advances in Neural Information Processing Systems, 27, https://papers.nips.cc/paper_files/paper/2014/file/f033ed80deb0234979a61f95710dbe25-Paper.pdf .

19.

Hale

(2019), “Google & Microsoft Banking on Africa’s AI Labeling Workforce,” Forbes (May 28), https://www.forbes.com/sites/korihale/2019/05/28/google-microsoft-banking-on-africas-ai-labeling-workforce/.

20.

Hochreiter

Schmidhuber

(1997), “Long Short-Term Memory,” Neural Computation, 9 (8), 1735–80.

21.

Hood

Wei

(2024), “AI Turned a Two-Month Task into Hours, Says JLL CMO Siddharth Taparia,” Business Insider (October 3), https://www.businessinsider.com/ai-turned-two-month-task-into-hours-jll-cmo-siddharth-taparia-2024-10.

22.

Jacob

A.P.

D.J.

Farina

Lerer

Bakhtin

Andreas

Brown

(2022), “Modeling Strong and Human-Like Gameplay with KL-Regularized Search,” Proceedings of Machine Learning Research, 162, 9695–728.

23.

Johnson

E.J.

Camerer

Sen

Rymon

(2002), “Detecting Failures of Backward Induction: Monitoring Information Search in Sequential Bargaining,” Journal of Economic Theory, 104 (1), 16–47.

24.

Jussupow

Benbasat

Heinzl

(2020), “Why Are We Averse Towards Algorithms? A Comprehensive Literature Review on Algorithm Aversion,” Proceedings of the 28th European Conference on Information Systems (ECIS2020), 1–16, https://www.researchgate.net/publication/344401293 .

25.

Karagözoğlu

Kocher

M.G.

(2019), “Bargaining Under Time Pressure from Deadlines,” Experimental Economics, 22 (2), 419–40.

26.

Kawaguchi

(2020), “When Will Workers Follow an Algorithm? A Field Experiment with a Retail Business,” Management Science, 67 (3), 1670–95.

27.

Kingma

D.P.

J.L.

(2014), “Adam: A Method for Stochastic Optimization,” 3rd International Conference on Learning Representations, arXiv, https://arxiv.org/abs/1412.6980v9 .

28.

Klein

(2017), “Google’s AlphaZero Destroys Stockfish in 100-Game Match,” Chess.com (December 6), https://www.chess.com/news/view/google-s-alphazero-destroys-stockfish-in-100-game-match.

29.

Kwak

Puzakova

Rocereto

J.F.

(2015), “Better Not Smile at the Price: The Differential Role of Brand Anthropomorphization on Perceived Price Fairness,” Journal of Marketing, 79 (4), 56–76.

30.

Lewis

Yarats

Dauphin

Y.N.

Parikh

Batra

(2017), “Deal or No Deal? End-to-End Learning for Negotiation Dialogues,” Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2443–53, https://doi.org/10.18653/v1/D17-1259.

31.

Liu

(2022), “Dynamic Coupon Targeting Using Batch Deep Reinforcement Learning: An Application to Livestream Shopping,” Marketing Science, 42 (4), 637–58.

32.

Longoni

Bonezzi

Morewedge

C.K.

(2019), “Resistance to Medical Artificial Intelligence,” Journal of Consumer Research, 46 (4), 629–50.

33.

Luo

Tong

Fang

(2019), “Frontiers: Machines vs. Humans: The Impact of Artificial Intelligence Chatbot Disclosure on Customer Purchases,” Marketing Science, 38 (6), 937–47.

34.

Makoni

(2022), “From Cars That Can’t ‘See’ Dark Skin to Tech That Can’t ‘Hear’ Diverse Accents: These Researchers Are Tackling Bias,” People of Color in Tech (August 23), https://peopleofcolorintech.com/engineers/self-driving-cars-that-cant-detect-black-skin-to-digital-assistants-that-have-trouble-understanding-non-white-accents-here-are-the-students-erasing-bias/.

35.

March

(2021), “Strategic Interactions Between Humans and Artificial Intelligence: Lessons from Experiments with Computer Players,” Journal of Economic Psychology, 87, 102426.

36.

Martins

Cruz

J.M.

Cruz

Henriques Abreu

(2020), “Adversarial Machine Learning Applied to Intrusion and Malware Scenarios: A Systematic Review,” IEEE Access, 8, 35403–19.

37.

Mathur

M.B.

Reichling

D.B.

(2016), “Navigating a Social World with Robot Partners: A Quantitative Cartography of the Uncanny Valley,” Cognition, 146, 22–32.

38.

McDowell

(2024), “How AI Is Transforming Social Media Behind the Scenes,” Vogue Business (August 20), https://www.voguebusiness.com/story/technology/how-ai-is-transforming-social-media-behind-the-scenes.

39.

McIlroy-Young

Wang

Sen

Kleinberg

Anderson

(2022), “Learning Models of Individual Behavior in Chess,” Proceedings of the 28th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1253–63, https://doi.org/10.1145/3534678.3539367.

40.

Mori

MacDorman

K.F.

Kageki

(2012), “The Uncanny Valley,” IEEE Robotics and Automation Magazine, 19 (2), 98–100.

41.

Porter

Machery

(2024), “AI-Generated Poetry Is Indistinguishable from Human-Written Poetry and Is Rated More Favorably,” Nature: Scientific Reports, 14, 26133.

42.

Prabhakaran

Qadri

Hutchinson

(2022), “Cultural Incongruencies in Artificial Intelligence,” arXiv, https://arxiv.org/abs/2211.13069v1.

43.

Rangarajan

(2021), “Hey Siri—Why Don’t You Understand More People Like Me?” Mother Jones (February 23), https://www.motherjones.com/media/2021/02/digital-assistants-accents-english-race-google-siri-alexa/.

44.

Rosenberg

Shabtai

Elovici

Rokach

(2021), “Adversarial Machine Learning Attacks and Defense Methods in the Cyber Security Domain,” ACM Computing Surveys, 54 (5), 108.

45.

Sarkar

De Bruyn

(2021), “LSTM Response Models for Direct Marketing Analytics: Replacing Feature Engineering with Deep Learning,” Journal of Interactive Marketing, 53 (1), 80–95.

46.

Snoek

Larochelle

Adams

R.P.

(2012), “Practical Bayesian Optimization of Machine Learning Algorithms,” Advances in Neural Information Processing Systems, 25.

47.

Sutton

R.S.

Barto

A.G.

(2018), Reinforcement Learning: An Introduction, 2nd ed. MIT Press.

48.

Valendin

Reutterer

Platzer

Kalcher

(2022), “Customer Base Analysis with Recurrent Neural Networks,” International Journal of Research in Marketing, 39 (4), 988–1018.

49.

Van Hoek

DeWitt

Lacity

Johnson

(2022), “How Walmart Automated Supplier Negotiations,” Harvard Business Review (November 8), https://hbr.org/2022/11/how-walmart-automated-supplier-negotiations.

50.

Van Zant

A.B.

Kennedy

J.A.

Kray

L.J.

(2023), “Does Hoodwinking Others Pay? The Psychological and Relational Consequences of Undetected Negotiator Deception,” Journal of Personality and Social Psychology, 124 (5), 1001–24.

51.

Wan

E.W.

Chen

R.P.

Jin

(2017), “Judging a Book by Its Cover? The Effect of Anthropomorphism on Product Attribute Processing and Consumer Preference,” Journal of Consumer Research, 43 (6), 1008–30.

52.

Wang

Luo

Wang

Smith

R.H.

(2022), “Deep Reinforcement Learning for Sequential Targeting,” Management Science, 69 (9), 1–22.

53.

Waytz

Heafner

Epley

(2014), “The Mind in the Machine: Anthropomorphism Increases Trust in an Autonomous Vehicle,” Journal of Experimental Social Psychology, 52, 113–17.

54.

Wierstra

Förster

Peters

Schmidhuber

(2010), “Recurrent Policy Gradients,” Logic Journal of the IGPL, 18 (5), 620–34.

55.

Williams

R.J.

(1992), “Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning,” Machine Learning, 8 (3), 229–56.

56.

Wilson

Hoffman

Morgenstern

(2019), “Predictive Inequity in Object Detection,” arXiv, https://arxiv.org/abs/1902.11097v1.

57.

Yeomans

Shah

Mullainathan

Kleinberg

(2019), “Making Sense of Recommendations,” Journal of Behavioral Decision Making, 32 (4), 403–14.

58.

Yoo

C.M.

(2019), “Go Master Lee Says He Quits Unable to Win over AI Go Players,” Yonhap News Agency (November 27), https://en.yna.co.kr/view/AEN20191127004800315.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.26 MB

	← Fully Human					Fully Efficient →
	Human vs. Human	primitive vs. primitive	superhuman vs. primitive	superhuman vs. primitive	superhuman vs. primitive	efficient vs. primitive
	(Camerer, Nave, and Smith 2019)		(λ = .5)	(λ = .3)	(λ = .1)
Acceptance rate
Overall	.50	.51	.43	.58	.60	.77
Pies $1–$4	.42	.41	.35	.46	.49	.71
Pies $5 and $6	.66	.70	.59	.81	.82	.90
Average payoffs—including strikes
Informed	.28 (.31)	.28 (.32)	.26 (.33)	.33 (.35)	.37 (.36)	.46 (.34)
Uninformed	.22 (.27)	.23 (.27)	.17 (.25)	.25 (.29)	.23 (.28)	.31 (.28)
Average payoffs—if agreement
Informed	.55 (.20)	.55 (.22)	.60 (.23)	.57 (.26)	.61 (.26)	.59 (.26)
Uninformed	.45 (.20)	.45 (.22)	.40 (.23)	.43 (.26)	.39 (.26)	.41 (.26)
Average number of counteroffers
Informed	2.22 (1.67)	2.16 (1.46)	2.42 (1.79)	2.19 (1.23)	1.61 (.82)	5.24 (2.89)
Uninformed	2.00 (1.03)	1.97 (.95)	2.06 (.96)	2.10 (.92)	2.12 (.93)	2.06 (.75)
Average time to acceptance	7.54 (2.27)	7.34 (2.20)	7.49 (2.16)	7.26 (1.87)	7.13 (2.00)	6.54 (1.87)
Percentage of agreements
0 to 6 seconds	18.27	18.50	15.08	27.80	31.86	53.20
6 to 8.5 seconds	32.31	29.04	26.93	29.15	31.05	35.09
8.5 to 10 seconds	21.52	20.93	19.76	15.42	18.21	7.74
Percentage of fair splits
Overall	33.78	24.34	13.77	17.98	15.39	14.37
Pies $1–$4	44.18	32.33	16.95	23.40	19.20	15.97
Pies $5 and $6	20.90	14.88	9.86	11.98	10.85	11.92
Bot detection	N.A.	.0%	31.0%	59.6%	69.4%	100.0%

	← Fully Human					Fully Efficient →
	Human vs. Human	primitive vs. Human	superhuman vs. Human	superhuman vs. Human	superhuman vs. Human	efficient vs. Human
	(This Article)		(λ = .5)	(λ = .3)	(λ = .1)
	N = 92	N = 80	N = 98	N = 90	N = 93	N = 106
Objective Outcomes
Acceptance rate
Overall	.74	.58	.56	.67	.58	.67
Pies $1–$4	.73	.53	.52	.59	.48	.63
Pies $5 and $6	.78	.73	.68	.94	.84	.78
Average payoffs—including strikes
Informed	.33 (.15)	.36 (.15)	.29 (.19)	.39 (.15)	.30 (.16)	.32 (.16)
Uninformed	.41 (.14)	.22 (.07)	.26 (.11)	.28 (.07)	.28 (.11)	.34 (.14)
Average payoffs—if agreement
Informed	.44 (.60)	.62 (.28)	.53 (.70)	.59 (.37)	.51 (.59)	.48 (.61)
Uninformed	.56 (.60)	.38 (.28)	.47 (.70)	.41 (.37)	.49 (.59)	.52 (.61)
Average number of counteroffers
Informed	.54 (.62)	1.75 (.59)	1.59 (.54)	1.62 (.66)	2.10 (.45)	4.33 (.79)
Uninformed	1.22 (.63)	1.18 (.69)	1.26 (.66)	1.07 (.57)	.88 (.79)	1.68 (1.38)
Average time to acceptance	4.74 (4.27)	6.00 (3.80)	6.05 (4.19)	5.96 (6.28)	6.90 (3.78)	6.27 (4.89)
Percentage of agreements
0 to 6 seconds	63.7	39.3	38.5	44.4	29.6	43.6
6 to 8.5 seconds	12.2	19.4	17.7	23.2	29.3	25.3
8.5 to 10 seconds	5.1	23.2	30.7	23.4	24.2	30.4
Percentage of fair splits
Overall	62.00	24.18	11.44	24.43	20.48	20.40
Pies $1–$4	64.49	30.93	14.08	30.72	24.63	22.40
Pies $5 and $6	55.18	8.97	5.36	11.72	13.77	15.21
Subjective Evaluations
Turing test	3.40 (1.46)	3.61 (1.79)	3.98 (1.82)	4.11 (1.71)	3.87 (1.80)	2.93 (1.61)
Feelings about …
the outcome	5.18 (1.43)	3.83 (1.30)	3.91 (1.51)	4.38 (1.21)	4.61 (1.23)	4.42 (1.34)
the self	4.81 (.88)	4.58 (.94)	4.38 (.89)	4.75 (.69)	4.83 (.84)	4.59 (.79)
the process	4.90 (1.60)	3.43 (1.50)	3.38 (1.60)	3.88 (1.41)	4.41 (1.44)	4.00 (1.45)
the relationship	4.95 (1.70)	3.21 (1.59)	3.20 (1.71)	3.70 (1.63)	4.33 (1.49)	3.90 (1.56)

	← Fully Human					Fully Efficient →
	Human vs. Human (This Article)	primitive vs. Human	superhuman vs. Human (λ = .5)	superhuman vs. Human (λ = .3)	superhuman vs. Human (λ = .1)	efficient vs. Human
	N = 92	N = 90	N = 91	N = 96	N = 93	N = 90
Objective Outcomes
Acceptance rate
Overall	.74	.60	.63	.68	.59	.76
Pies $1–$4	.73	.56	.61	.61	.51	.73
Pies $5 and $6	.78	.74	.70	.89	.83	.86
Average payoffs—including strikes
Informed	.33 (.15)	.36 (.13)	.34 (.20)	.40 (.17)	.32 (.20)	.38 (.19)
Uninformed	.41 (.14)	.24 (.08)	.29 (.13)	.28 (.07)	.27 (.12)	.38 (.14)
Average payoffs—if agreement
Informed	.44 (.60)	.60 (.34)	.54 (.73)	.59 (.40)	.55 (.76)	.50 (.72)
Uninformed	.56 (.60)	.40 (.34)	.46 (.73)	.41 (.40)	.45 (.76)	.50 (.72)
Average number of counteroffers
Informed	.54 (.62)	1.78 (.49)	1.45 (.53)	1.48 (.74)	2.12 (.46)	4.51 (1.18)
Uninformed	1.22 (.63)	1.51 (.81)	1.46 (.78)	1.19 (.81)	1.47 (1.53)	2.58 (2.00)
Average time to acceptance	4.74 (4.27)	6.52 (3.85)	5.97 (4.57)	5.51 (5.95)	6.58 (4.35)	5.95 (3.61)
Percentage of agreements
0 to 6 seconds	63.7	35.3	43.8	51.3	34.8	57.4
6 to 8.5 seconds	12.2	25.9	19.7	17.3	25.6	22.2
8.5 to 10 seconds	5.1	25.7	26.6	21.5	20.7	19.8
Percentage of fair splits
Overall	62.00	25.19	11.20	21.69	17.02	19.05
Pies $1–$4	64.49	32.06	13.65	25.26	19.32	18.64
Pies $5 and $6	55.18	8.78	3.95	14.97	12.98	20.05
Subjective Evaluations
Turing test	3.40 (1.46)	2.41 (1.49)	2.26 (1.46)	2.27 (1.44)	2.47 (1.56)	1.96 (1.12)
Feelings about …
the outcome	5.18 (1.43)	3.94 (1.35)	4.00 (1.37)	4.50 (1.39)	4.37 (1.30)	4.73 (1.38)
the self	4.81 (.88)	4.50 (.82)	4.45 (.79)	4.64 (.93)	4.61 (.84)	4.76 (.80)
the process	4.90 (1.60)	3.58 (1.36)	3.38 (1.58)	3.91 (1.57)	4.02 (1.44)	4.37 (1.36)
the relationship	4.95 (1.70)	3.46 (1.49)	3.35 (1.71)	3.81 (1.55)	3.90 (1.57)	4.24 (1.42)

Bots Bargaining with Humans: Building AI Super-Bargainers with Algorithmic Anthropomorphization

Abstract

Keywords

Literature Review

Subjective Value Inventory

Algorithm Aversion

Anthropomorphization

Game Mechanics

Methodology

Overview

Deep Learning Model

Feature Engineering

Policy Formulation

Objective Functions

primitive bot

efficient bot

superhuman bot

Training Details

Simulations

Experiment

Experimental Design

Results

Analyses

Algorithm aversion and anthropomorphization (H1–H3)

primitive bot (H4)

efficient bot (H5 and H6)

superhuman bot (H7 and H8)

Understanding the efficient bot’s underperformance

Conclusions

Summary

Uncanny Valley?

Sample Differences

Key Takeaways

Methodological

Substantive

Managerial

Future Research

Supplemental Material

sj-pdf-1-mrj-10.1177_00222437251375323 - Supplemental material for Bots Bargaining with Humans: Building AI Super-Bargainers with Algorithmic Anthropomorphization

Footnotes

Coeditor

Associate Editor

Declaration of Conflicting Interests

Funding

ORCID iDs

Notes

References

Supplementary Material

Algorithm aversion and anthropomorphization (H₁–H₃)

primitive bot (H₄)

efficient bot (H₅ and H₆)

superhuman bot (H₇ and H₈)