Sage Journals: Discover world-class research

Abstract

Objective

We developed a taxonomy for human–agent teams (HATs) and conducted a literature review of existing HAT testbeds using our proposed taxonomy.

Background

With the increasing interest in HATs, numerous research studies in this field have utilized different testbeds. Despite this, there is a lack of comprehensive understanding regarding the capabilities and limitations of the existing testbeds.

Method

We first developed a taxonomy for HATs by modifying the existing framework for classifying human teams. Our proposed taxonomy comprises ten attributes. Subsequently, using the taxonomy, we analyzed 103 testbeds identified from 235 empirical research studies. After coding each testbed, we conducted frequency analyses on each attribute to determine the distribution of the testbeds.

Results

Regarding team composition, the majority of testbeds afford single human participants paired with few agents, typically in subordinate roles. Also, in most testbeds, the leadership structure is designated, with humans assuming leadership roles, or none. The communication dynamics present an area for further exploration, especially with larger team sizes. Additionally, nearly all reviewed testbeds focus on long-term teams, overlooking dynamics in ad hoc teams, which are common in real-world settings.

Conclusion

Our findings underscore the importance of further research into diverse team attributes, such as team composition, leadership structure, communication structure, direction, and medium. It would facilitate a deeper understanding of complex team dynamics in HATs and lead to designing effective teams.

Application

The current study would be valuable for discussing future research directions when developing new testbeds or designing novel experiments leveraging existing ones.

Keywords

human–autonomy interaction human–agent interaction autonomous agents team dynamics team characteristics teamwork

Introduction

With the growing adoption of autonomous technologies across various domains, human–agent team (HAT) has become a key area of research across multiple disciplines such as human factors, robotics, and computer science. Most studies on HATs have utilized proprietary testbeds developed by the respective research teams. Consequently, each testbed possesses distinct characteristics. For instance, the Mixed Initiative Experimental Testbed (MIX) (Barber et al., 2008) involves a RoboLeader capable of gathering information from subordinate robots with limited autonomy, making tactical decisions, and coordinating the robots. In this testbed, one human is tasked with target detection utilizing the multiple subordinate robots and route replanning with assistance from the RoboLeader. Another frequently utilized testbed, Blocks World for Teams (BW4T) (Johnson et al., 2009) features a multi-human multi-agent team in which all members cooperate to search through rooms and deliver blocks in a specific order (Chung & Yang, 2025a, 2025c). Another multi-agent testbed developed recently utilizes the “team of teams” (McChrystal et al., 2015) idea and allows the teams to reconfigure as problem contexts evolve (Guo et al., 2023a, 2024). These and other testbeds differ in their team designs, task domains, objectives, and team compositions.

Using various testbeds, research studies have explored the effects of a multitude of factors, including human-related factors such as culture, past experience, and spatial ability (Bhat et al., 2022; Chen et al., 2008; Hafızoğlu & Sen, 2018); agent-related factors such as reliability, autonomy level, reputation, and value alignment (Bhat et al., 2024b; Hafizoglu & Sen, 2018; Hillesheim & Rusnock, 2016; Hoffman & Breazeal, 2007; Mercado et al., 2016); task-related factors including task difficulty and design (Chen et al., 2011a, 2013; Cohen & Imada, 2005); and team-related factors such as team composition, communication type, and communication medium (Chen, Barnes, Qu, & Snyder, 2010; Walliser et al., 2019). Recent reviews on HATs by O’Neill et al. (2022), Lyons et al. (2021), and Chen and Barnes (2014) provide excellent overviews of this literature, with a focus on the dependent and independent variables examined in prior studies.

Despite these contributions, a comprehensive overview of HAT testbeds themselves remains lacking (Chung et al., 2024), and this absence raises several important concerns: First, oftentimes testbed design constrains which constructs can be manipulated or measured; for example, examining the “team of teams” (Guo et al., 2023a; McChrystal et al., 2015) concept can only be evaluated on platforms where team composition can dynamically change. Second, the field has increasingly called for studies involving larger, more heterogeneous teams that reflect real-world teaming scenarios (Karwowski et al., 2025; Lyons et al., 2021; Nguyen et al., 2022; O’Neill et al., 2022). However, without a structured overview of testbed capabilities, researchers cannot easily identify which platforms support the next generation of research questions. Last but not least, in the absence of a review, laboratories frequently “re-invent the wheel” and develop environments that duplicate existing functionality, wasting valuable resources from theory-driven experimentation to infrastructure building.

To address this need, we conducted a literature review of HAT testbeds featured in prior research to better understand the current landscape of existing HAT testbeds. Our scope is limited to virtual agents without physical embodiment. Testbeds implemented through software programming offer significant flexibility in simulating a wide range of teaming scenarios (e.g., emergency response to fires or crimes), some of which are extremely challenging to create with embodied agents. This review aims to answer the following research questions: What kinds of HAT testbeds exist, and how are they designed? What types of team characteristics do they support? For each key team attribute, how are existing testbeds distributed, and are there any prevalent or underrepresented attributes that need further attention?

We identified a total of 103 testbeds used in 235 studies and analyzed them using a taxonomy developed to categorize HATs. Our taxonomy is based on the widely accepted classification scheme for human–human teams from Wildman et al. (2012), which originally comprises six team-level attributes: task interdependence, role structure, leadership structure, communication structure, physical distribution, and team life span. We iteratively modified the scheme to suit the context of HATs. Notably, we introduced four new attributes relevant to HATs: team composition, leadership role assignment, communication direction, and communication medium.

The current study has made several significant contributions:

(1) We developed a novel taxonomy specifically tailored for HATs by adapting a widely recognized framework for classifying human–human teams. This taxonomy includes ten attributes: team composition, task interdependence, role structure, leadership structure, leadership role assignment, communication structure, communication direction, communication medium, physical distribution, and team life span.

(2) Utilizing this taxonomy, we performed a comprehensive review of 103 testbeds used in 235 studies. This analysis offers detailed insights into the structures and characteristics of these testbeds, highlighting their unique functionalities and configurations.

(3) We identified significant gaps in the current design of testbeds and studies of HATs. Our findings reveal that the majority of testbeds were designed to examine dyadic HATs with fixed role and leadership structures, relying on primitive communication channels. We highlight the need for more flexible, customizable testbeds that allow researchers to manipulate various team attributes, paving the way for richer investigations into how team attributes influence HAT performance.

Method

Development of Taxonomy for Human–Agent Teams (HATs)

To develop a comprehensive HAT taxonomy for analyzing and defining team characteristics in existing HAT testbeds, we referred to previous literature on human team classification. We first referred to frameworks and taxonomies for classifying human–automation interaction (Endsley & Kaber, 1999; Hancock et al., 2011; Parasuraman et al., 2000). Yet, their focus is primarily on defining levels of automation and outlining the factors that shape human–automation interaction. Their perspective is not necessarily centered on analyzing how humans and agents, as a “team,” work together to achieve a common goal, which is the focus of the current review.

A work team and teaming are defined as the adaptive, interdependent, and dynamic interaction of two or more individuals working toward a common and valued goal (Salas et al., 1992). This definition remains applicable regardless of whether the team members are human or computer-programmed intelligent agents. Consequently, we decided to adopt a well-defined framework from traditional human team research that outlines a holistic set of team-level attributes, and adapted it to the HAT context.

Several taxonomies or systematic frameworks of key team dimensions have been proposed in the human team literature. For example, Hollenbeck et al. (2012) defined three major axes that characterize various types of teams: skill differentiation, which concerns the role assignments of team members; authority differentiation, which defines leadership and hierarchical structures; and temporal stability, which describes whether structural linkages among team members are short-term or long-lasting. While insightful, this framework lacks an explicit emphasis on interaction and communication processes. Tiferes and Bisantz (2018) proposed a broader set of eleven team dimensions; however, many of these focus more on task demands or individual characteristics (e.g., task load, time pressure, and prior experience) than on the structural attributes of the team itself. Other influential frameworks, such as those from Salas et al. (2005) and Klein et al. (2010), emphasize behavioral markers of effective teamwork, including mutual performance monitoring, backup behavior, and sensemaking. While these are valuable for evaluating team functioning, they are less suited for classifying HATs with a special focus on testbed design.

Our HAT taxonomy is an adaptation of the taxonomy for human teams introduced by Wildman et al. (2012). This taxonomy, widely recognized as one of the most robust frameworks for classifying human teams, is grounded in team taxonomic literature from organizational psychology (Bell & Kozlowski, 2002; Keyton & Beck, 2008; Pugh et al., 1969). It offers three major advantages over alternative taxonomies. First, it provides a clear conceptual distinction between task type and team characteristics, a distinction often blurred in other frameworks. Second, it categorizes teams into mutually exclusive and comprehensive classes. Third, the six attributes (task interdependence, role structure, leadership structure, communication structure, physical distribution, and team life span) comprehensively encompass the majority of the dimensions presented in other team classification schemes. We retained all six attributes, as they serve as core characteristics that explain the team properties of HATs. To further tailor the framework to HATs, we introduced four additional attributes: team composition, leadership role assignment, communication direction, and communication medium. As a result, the HAT taxonomy now comprises ten unique attributes (see Table 1).

Table 1.

Taxonomy for Human–Agent Teams (HATs).

Attribute	Definition	Category
Team composition	Team configuration as characterized by “#Human - #Agent”	One-to-one (1–1)
		One-to-many (1-M)
		Many-to-one (M-1)
		Many-to-many (M-M)
Task interdependence	The extent to which outcomes of the team members are influenced by, or depend on, the action of others	Pooled
		Sequential
		Reciprocal
		Intensive
Role structure	The extent to which roles are fundamentally different and therefore not interchangeable	Functional
Role structure		Divisional
Leadership structure	The pattern, or distribution, of leadership functions such as setting discretion and aligning goals among the members of the team	External manager
		Designated
		Temporary
		Distributed
		None
Leadership role assignment	Classification of whether the human, the agent, or either assumes the leadership roles	Human
		Agent
		Human/Agent
		None
Communication structure	The pattern, or flow, of communication and information sharing among the members of the team	Dyadic
		Hub-and-wheel
		Chain
		Star
Communication direction	Communication direction between humans and agents, among humans, and among agents	Unidirectional
Communication direction		Bidirectional
Communication medium	Methods available to interact and exchange information	UI control-based
		Text-based
		Speech-based
		Haptic-based
		Gesture-based
		Biosignal-based
		Multi-modal
Physical distribution	Spatial location of the team members in reference to one another	Colocated
		Distributed
		Mixed
Team life span	The length of time for which the team exists as a functional, active unit	Ad hoc
Team life span		Long-term

While the original taxonomy emphasizes mutually exclusive categories, our adapted HAT taxonomy can be applied more flexibly based on experimental configurations. This approach allows for double-coding when a single testbed supports multiple configurations.

Team composition refers to the mix of humans and artificial agents that make up the team to achieve a common goal. In the context of HATs, it is essential to consider the composition of the team, as it not only influences the behaviors of each individual within a team but also the collective behaviors of the team. A large portion of early studies in HATs focuses on the classical dyadic interaction between one human and one agent. However, there is growing attention to non-dyadic teams (Schneiders et al., 2022; Guo et al., 2023a, 2023b; O’Neill et al., 2022). There are four discrete classes for this attribute: one-to-one (1–1), one-to-many (1-M), many-to-one (M-1), and many-to-many (M-M). One-to-one (1–1) refers to one human interacting with one artificial agent. This is the classical dyadic interaction. In the one-to-many (1-M) category, one human interacts with at least two artificial agents. Studies on human-swarm interaction (Nagi et al., 2014) often fall into this category. In the many-to-one (M-1) category, at least two humans interact with one artificial agent. Examples include a robotic tour guide in museums, wherein many visitors interact with one artificial agent (Faber et al., 2009). The last category, many-to-many (M-M), refers to a team where at least two humans interact with at least two artificial agents. One example of this category is the study of Guo et al. (2023b) wherein two humans and two artificial agents perform a search detection task.

Task interdependence attribute refers to how team members’ actions influence their outcomes (Saavedra et al., 1993). This attribute can be categorized into four distinct levels. In the pooled level, each team member, either a human or an agent, independently contributes to the outcome without requiring interaction with others. In sequential, one team member’s actions must precede another team member’s actions, resembling an assembly line setup. Reciprocal type involves team members interacting with each other in a back-and-forth manner on a one-to-one basis, but not simultaneously with multiple members. Lastly, intensive task interdependence represents the highest degree of interaction. At this level, the entire team collaborates and works together as a cohesive unit to achieve their shared objective.

Role structure describes the extent to which team roles are specialized or interchangeable (Thylefors et al., 2005). Work can be divided in a functional or divisional manner (Harris & Raviv, 2002; Wildman et al., 2012). If team members perform different roles that are not interchangeable, this is considered a functional role structure. Examples include the cross-functional design team comprising people from marketing, engineering design, user experience, and manufacturing (Yang et al., 2012). On the contrary, if every team member can perform the same function and can replace one another, this is considered a divisional role structure.

Leadership structure refers to how leadership functions are distributed within a team (Wildman et al., 2012). Wildman et al. (2012) identified four configurations of leadership structure by incorporating relevant leadership literature: external manager, designated, temporary, and distributed. In the external manager structure, the leadership role is performed by someone outside the team. The designated structure involves assigning one team member as the leader. In the temporary structure, leadership rotates among team members across different tasks and timeframes, with different members acting as the leader as needed. In the distributed structure, leadership functions are divided among various team members. Additionally, a none type is introduced into our taxonomy for the leadership structure attribute. This type is relevant in the context of HATs since there are instances where neither a human nor an agent team member is responsible for the leadership role during the experiment (Frieder et al., 2021; Hoffman & Breazeal, 2007).

Leadership role assignment refers to who has the decision-making responsibility and assumes the leadership roles (Hollenbeck et al., 2012). In our taxonomy, this attribute specifically identifies who (human or agent) holds the authority to make final decisions, which is closely associated with the autonomy level of an agent. For instance, among the 10 autonomy levels proposed by Sheridan & Verplank (1978), in lower levels (1–6), humans retain the responsibility for making final decisions, even though automation may provide recommendations or suggestions. On the other hand, in higher levels (7–10), automation is granted the authority to determine actions without requiring explicit human approval or intervention.

Communication structure refers to the specific patterns of communication that exist within a team (Dyer, 1984). There are three primary communication patterns: hub-and-wheel, star, and chain. In the hub-and-wheel communication structure, communication flows through a central control team member, often the team leader, known as the “hub.” This central figure then disseminates information to all other team members, creating a hub-and-spoke-like network of communication. Chain structure applies when communication follows a hierarchical path “up and down the line.” Information is passed from one team member to the next in a linear sequence, based on the established hierarchy within the team. In star structure, information is shared freely among all team members without any central point of contact or hierarchical structure. This type of communication allows for open and direct communication between team members, promoting a free-flowing exchange of ideas and information. This category also applies when all team members freely exchange information via shared displays. In addition, a dyadic type is added to the communication structure attribute. This is relevant in HAT literature, where many scenarios involve a team comprising one human and one artificial agent (e.g., Hanna et al., 2015; Harriott et al., 2012). In these scenarios, communication is limited to a dyadic interaction between the human and the agent, as the team consists of only two members.

Communication direction defines how information is exchanged between human members and artificial agents, among humans, and among agents. Two primary modes of communication direction can be applied across all three types of pairings: unidirectional and bidirectional. In the unidirectional communication mode, information flows in one direction only. In the case of human–agent communication, the direction can be either from humans to agents or from agents to humans. For instance, a human might give a command to a robotic arm in a manufacturing setting, and the robot acts upon the command without providing any feedback to the human (Tian et al., 2023). In contrast, bidirectional communication involves information flowing back and forth between the two parties, allowing both sides to give and receive information (Chiou & Lee, 2016). This type of communication fosters a two-way exchange of information and feedback between the team members.

Communication medium refers to the methods available for interaction and information exchange (Tiferes & Bisantz, 2018). There are UI control-based, text-based, speech-based, haptic-based, gesture-based, biosignal-based, and multi-modal. UI control-based medium includes buttons, menus, sliders, or other user interface elements that a person can manipulate to interact with an artificial agent. For instance, controlling an AI-powered drone using a software interface would correspond to UI control-based. Text-based medium involves the exchange of written information. For example, a user might type queries for a text-based AI chatbot. The speech-based medium involves verbal communication. An example would be a voice-controlled AI assistant like Siri or Alexa, which can receive spoken commands and provide audio responses. Haptic-based medium involves the sense of touch, where force, vibration, or motion is used to communicate with an AI system, such as in certain virtual reality (VR) or augmented reality (AR) environments. Gesture-based medium involves communicating through physical gestures. For example, an AI-powered robot might interpret hand signals from a human team member. Biosignal-based medium involves the use of biosignals such as electroencephalogram (EEG), functional near-infrared spectroscopy (fNIRS), and eye gaze data. For example, a brain–computer interface (BCI) enables the translation of brain signals into system inputs for communication. Finally, multi-modal medium uses a combination of the above mediums.

Physical distribution refers to the classification of how spatially proximal team members are (Wildman et al., 2012). Teams can be fully colocated, meaning all members are close enough in physical location to collaborate directly on shared tasks. For example, a team assembling furniture together in a room would be considered colocated. Alternatively, teams can be fully distributed, where members are physically separated. In an urban search and rescue task, for instance, team members may be assigned to search different rooms independently while using a messaging tool to communicate the locations of identified victims. Teams can also exhibit a mixed physical distribution, in which a subset of the team is colocated while others are distributed. For example, two pilots sharing the same cockpit to control multiple autonomous UAVs which are operating in different locations, represent a mixed team: the pilots are colocated, while the UAVs are distributed across various areas. In the context of virtual HATs, the physical distribution attribute can be applied by considering the characteristics of the task. The classification depends on whether the task assumes physically colocated cooperation (e.g., assembly), distributed cooperation (e.g., independent reconnaissance over distant locations), or a mixed form involving both.

Lastly, team life span refers to the length of time for which the team exists (Devine, 2002). Teams can be classified as either ad hoc or long-term. An ad hoc team is formed to accomplish a specific, temporary task and dissolves upon its completion, while a long-term team exists for the duration of its defined purpose. It is very important to note that the classification of a team as ad hoc or long-term is relative to how its purpose and duration are defined. For example, a cross-functional team assembled to develop a new software product might include a software engineer, UX designer, product manager, and marketing strategist, each contributing distinct expertise toward a common goal (Yang et al., 2012). From the broader organizational perspective, such a team is considered ad hoc because it exists only temporarily in support of a larger, ongoing mission. However, if the team’s defined purpose is bounded specifically to the software development project, and the team composition remains stable throughout that period, it should be considered long-term within the defined purpose. In traditional human teams and organizational settings, the classification of team life span typically reflects the team’s functional role within a larger organizational structure (Wildman et al., 2012). Adapting this concept to the context of HATs, where the full organizational context is often not reflected in experimental settings, we define team life span as the bounded duration from the start to the end of a team task. If, during the task, team members are instructed or allowed to freely form and dissolve teams based on a series of temporary subtasks, the team is classified as ad hoc. In contrast, if the team configuration remains fixed throughout the task as determined by the experimenter, the team is considered long-term.

Human–Agent Teaming Testbed Analysis

Based on the proposed HAT taxonomy (Table 1), a total of 103 testbeds used in 235 empirical research studies were analyzed. The objective of the analysis was to investigate how the existing HATs are designed and what kind of team characteristics they have.

Testbed Search

Building on the recent review by O’Neill et al. (2022), our search strategy applied an expanded set of criteria to capture a more comprehensive set of studies, with particular attention to testbed design. Six inclusion criteria were used to select eligible studies: The study must: (a) be empirical, (b) include a detailed description of the autonomous agent, (c) involve at least one autonomous agent and one human, (d) involve these entities working interdependently on a task toward a shared goal, (e) feature only virtual agents implemented as software programs, excluding physically embodied agents such as industrial robots or cobots that perform actions in the physical world, and (f) provide sufficient descriptions of how human(s) and autonomous agent(s) interact with each other. Note that the first five criteria were adapted from O’Neill et al. (2022). The last criterion was added because our focus is on discovering and analyzing team characteristics featured in testbeds.

Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines (Moher et al., 2009), we conducted a systematic database search across six electronic databases using a predefined keyword search strategy. The databases included: Web of Science, Scopus, Science Direct, ACM Digital Library, IEEE Xplore Digital Library, and Sage Journals. The search included all records published until December 2024 (inclusive).

To develop our search string, we built on the approach by O’Neill et al. (2022) and expanded their keyword set. Specifically, we broadened the term “teaming” to “team*” to capture records containing “team,” “teams,” and “teaming,” and added key terms of “human-machine interaction” and “human-machine systems.” The full search query was as follows:

\begin{aligned} Search query = & \{“ human - autonomy team*”  OR “human - automation team*”  OR \\ “ human - agent team*”  OR “human - machine team*” OR \\ “ synthetic agent” OR “human - automation interaction” OR \\ “ human - machine interaction” OR “human - machine systems ”\} \\ AND teamwork \end{aligned}

In addition to records retrieved through the database search, we included all records identified in O’Neill et al. (2022), along with five additional records recommended by anonymous reviewers.

The database search resulted in a total of 3660 records. After screening abstracts and methodologies, we excluded 2948 records that were duplicates, nonempirical studies (e.g., reviews and workshops), involved physically embodied agents, or were out of scope. We then assessed 712 full-text articles for eligibility, carefully applying the six inclusion criteria. Additionally, to ensure the relevance, reliability, validity, and applicability of each paper, we assessed each study using the following questions: (1) Was the testbed used in a controlled human-subjects experiment? (2) Does the paper include a clear description of the agent’s capabilities, the design of human–agent interaction, and the team configurations? (3) Does the testbed meaningfully simulate a HAT scenario involving shared goals and interdependence? This process resulted in the inclusion of 235 full-text articles.

Finally, we compiled all included full-text articles (n = 235) and identified testbeds described in the literature, yielding a total of 103 testbeds. Figure 1 illustrates the overall flow of the literature search. The final list of testbeds included in this paper can be found in Table 2.

Figure 1.

PRISMA flow diagram for literature search.

Table 2.

List of Testbeds.

	Name	Studies
1	MIX (Mixed Initiative Experimental Testbed)	Chen et al. (2008); Chen et al. (2010b, a); Chen and Barnes (2010); Chen, Barnes, Qu, & Snyder (2010); Chen, Barnes, & Qu (2011); Chen, Barnes, & Kenny (2011); Chen, Barnes, Quinn, & Plew (2011); Chen and Barnes (2012); Chen et al. (2013); Wright et al. (2013, 2014, 2016a, b, 2018)
2	CERTT (Cognitive Engineering Research on Team Tasks-Remotely Piloted Aircraft Task Environment-Synthetic Task Environment)	Cooke et al. (2016); Demir and Cooke (2014); Demir et al. (2016); Demir, Amazeen, et al. (2017); Demir, McNeese, & Cooke (2017); Demir, Cooke, & Amazeen (2018); Demir, Likens, et al. (2019); Demir, McNeese, & Cooke (2018); Demir, McNeese, Johnson, et al. (2019); Gorman et al. (2019); Grimm et al. (2018b, a); McNeese et al. (2018); McNeese et al. (2019); Bhatti et al. (2021); Grimm et al. (2023); Demir et al. (2021); Duan, Weng, et al. (2024); Scalia, Harrison, et al. (2022); Demir et al. (2023a); McNeese et al. (2021); Cohen et al. (2021); Johnson, Demir, McNeese, et al. (2023); Johnson et al. (2020); Duan, Zhou, et al. (2024); Demir et al. (2023b); McNeese, Demir, et al. (2021); Demir, McNeese, & Cooke (2019); Scalia, Zhou, et al. (2022); Ball et al. (2010); McNeese et al. (2021)
3	UxV_Sim (Simulation of a Network of Decentralized Collaborative UxVs)	Cummings et al. (2012, 2010); Clare et al. (2012)
4	AST (Autonomous Synthetic Teammate within a Simulated Remotely Piloted Aerial System)	Freiman et al. (2018); Myers et al. (2019)
5	Teleop_Sim (Simulation of Drones Teleoperation)	Memar and Esfahani (2018)
6	SIL (Simulation Integration Laboratory)	De Visser and Parasuraman (2011)
7	ATC (Air Traffic Control)	de Rooij et al. (2024)
8	CCA (Collaborative Combat Aircraft)	Lyons et al. (2024)
9	MAHVS (Multiple Autonomous Heterogeneous Vehicle Simulation)	Gall and Stanton (2024)
10	UAV_Surv (Simulation of Surveillance Task Involving UAVs)	Zhu et al. (2023)
11	RESCHU (Research Environment for Supervisory Control of Heterogeneous Unmanned Vehicles)	Chien et al. (2016); Walliser et al. (2023); Chien et al. (2018)
12	MITPAS (Mixed Initiative Team Performance Assessment System)	De Visser et al. (2006)
13	TBC (Three-Block Challenger; Recognition Primed Decision-Enabled Collaborative Agents)	Fan et al. (2005, 2008); Fan, McNeese, & Yen (2010); Fan and Yen (2011); Yen et al. (2006); Fan, McNeese, Sun, et al. (2010); Fan et al. (2006)
14	TANDEM (Tactical Navy Decision Making)	Lenox et al. (1999); Lewis et al. (2003); Sycara (2002); Lenox et al. (1997); Sycara et al. (1998); Lenox et al. (1998)
15	UMAST (UAV Modeling and Analysis Simulator Testbed with the UMAST Decision-aiding Tool)	Ruff et al. (2002)
16	SGD (Strike Group Defender)	Walliser et al. (2017, 2019)
17	L4D2 (Left4Dead2)	Wehbe et al. (2017)
18	TDT (Threatre Defense Task)	M. C. Wright and Kaber (2005, 2003)
19	TA (Threat Assessment)	Matthews et al. (2024); Lin et al. (2022)
20	VSCS (Vigilant Spirit Control Station-used by the Air Force to develop interfaces to control multiple UAVs)	El Iskandarani et al. (2023); Atweh and Riggs (2024a); Schneider et al. (2022); Atweh, Hazimeh, & Riggs (2023); Atweh, Hayek, & Riggs (2023); Atweh and Riggs (2024b)
21	TSF (Team Space Fortress)	H. Li et al. (2021); Capiola, Lyons, et al. (2023)
22	ATDS (Automated Target Detection System)	Albayram et al. (2020); Fahim et al. (2023)
23	Combat_Sim (Simulated Environment to Operate Combat Vehicles)	Mullins et al. (2024)
24	FM_Sim (Flight Mission Simulator)	Panganiban et al. (2020)
25	Navy_Sim (Simulated Navy Environment)	Dikmen et al. (2020, 2019)
26	TAISR (Trust-Aware Intelligence, Surveillance, and Reconnaissance)	Bhat et al. (2024a, 2022)
27	StarCraft	Evertsz and Thangarajah (2020)
28	Netrek	Mallick, Flathmann, Lancaster, et al. (2024); Mallick, Flathmann, Duan, et al. (2024)
29	ISR_Wargame (Intelligence, Surveillance, and Reconnaissance Wargame Control)	Agbeyibor, Ruia, Kolb, & Feigh (2024); Agbeyibor, Ruia, Cortes, et al. (2024)
30	ISRM (Intelligence, Surveillance, and Reconnaissance Missions)	Rebensky et al. (2022)
31	DDD (Dynamic Distributed Decision Making Simulation)	De Visser et al. (2010); Ahmed et al. (2014); McKendrick et al. (2011, 2014)
32	Recce_Sim (Simulated Environment for Reconnaissance)	Harriott et al. (2012)
33	WebHRC (Web-based Human-Robot Collaboration Testbed)	C. Liu et al. (2018)
34	USAR_Vict:a (Urban Search and Rescue for Victims: Version.a)	Narayanan et al. (2015)
35	OSG (Online Search Game)	Khavas and Robinette (2024); Khavas et al. (2024); Rezaei Khavas et al. (2024)
36	VMM (Virtual Military Mission: Transport and Reconnaissance Operation)	Kox et al. (2024, 2021)
37	USAR_Vict:b (Urban Search and Rescue for Victims: Version.b)	van den Bosch et al. (2024); Schadd et al. (2022); Schoonderwoerd et al. (2022)
38	TRESCHU (Team Research Environment for Supervisory Control of Heterogeneous Unmanned Vehicles)	Gao and Cummings (2014); Gao et al. (2016)
39	USAR_Minecraft:a (Minecraft Urban Search and Rescue: Version.a)	Demir, Cohen, et al. (2023); Chiou et al. (2022); Demir, McNeese, & Cooke (2020)
40	IIHAT (Implicit Interaction for Human-Autonomy Teams)	Schelble, Flathmann, et al. (2023); Musick et al. (2021)
41	USAR_MrCS (Urban Search and Rescue with Multi-robot Control System)	Gao (2013); Gao et al. (2014, 2012-03)
42	SSE (Sensitive Site Exploitation)	Frericks et al. (2024)
43	USAR_Minecraft:b (Minecraft Urban Search and Rescue: Version.b)	Bendell et al. (2024)
44	PC (Perfect Circle)	Prada and Paiva (2009)
45	Debris	Ulusan et al. (2022)
46	C&S (Cordon and Search)	Wright et al. (2022); Lakhmani et al. (2019, 2020)
47	RescueBot	Verhagen et al. (2024, 2023)
48	AI_SAR (AI-guided Search and Rescue)	Zhao et al. (2022)
49	F&R (Find and Remove Objects)	Rossi et al. (2017)
50	S/B (Serbia/Bosnia Domain Game)	Frieder et al. (2021)
51	SN (Space Navigator)	Goodman et al. (2016); Hillesheim and Rusnock (2016); Bindewald et al. (2020)
52	IVA:a (Intelligent Virtual Agent: Version.a)	Hanna et al. (2015)
53	MokSAF (Modular Semi-Automated Forces)	Lenor et al. (2000); Lewis et al. (2003); Payne et al. (2000); Sycara (2002)
54	FUSION (U.S. Air Force Research Laboratory’s FUSION Interface)	Mercado et al. (2016)
55	HiveMind	Farah and Dorneich (2024)
56	MLG (Moon Lander Game)	Momose et al. (2023, 2024)
57	Swarm	Capiola, Johnson, et al. (2023); Capiola et al. (2024)
58	BTRA (Boundary Tracking Robot Agent)	Wong et al. (2019); Xu and Dudek (2016)
59	FL (Frozen Lake)	Natarajan et al. (2024)
60	IVA:b (Intelligent Virtual Agent-Version.b)	Hanna and Richards (2014); Hanna et al. (2013); Hanna and Richards (2019, 2018)
61	BW4T (Blocks World for Teams)	Harbers, Bradshaw, Johnson, Feltovich, van den Bosch, & Meyer (2011); Harbers, Bradshaw, Johnson, Feltovich, Van Den Bosch, & Meyer (2011); Johnson et al. (2012); Li et al. (2016)
62	Fcty_Sim (Simulated Factory)	Hoffman and Breazeal (2007)
63	CHAOPT (Cooking with Humans and Autonomy in Overcooked for studying Performance and Teaming)	Meimandi et al. (2024); Long et al. (2024); Le Guillou et al. (2023); Paleja et al. (2024); Li et al. (2024)
64	Overcooked (Overcooked!2 developed by AFRL and GRILL at the U.S. Air Force Academy)	Bishop et al. (2020)
65	CIR (Collaborative Industrial Robot)	Simon et al. (2023)
66	WS (2D Grid World Supermarket)	Jorge et al. (2024)
67	CT (Colored Trails)	van Wissen et al. (2010, 2012)
68	AQM (Automatic Quality Monitor)	Yu et al. (2019)
69	FtR (For the Record)	Tulli et al. (2019)
70	LEGO (LEGO Construction)	Demir, Amazeen, & Cookea (2020)
71	M_Box (Moving Boxes)	Centeio Jorge et al. (2023)
72	CiscoPT (Cisco Packet Tracer)	Hauptman et al. (2024)
73	HyForm	Xu et al. (2023); Song et al. (2022)
74	CSW (Cooperative Shared Workspace)	Salikutluk et al. (2024)
75	House_Minecraft (Minecraft House Construction)	Paleja et al. (2021)
76	P&S_Box (Pick and Store Box)	Esterwood and Robert (2023)
77	Transp_Box (Box Transportation)	Liu et al. (2019)
78	ArmA3 (ArmA3 first-person game)	Zhang et al. (2023); Schelble, Lancaster, et al. (2023)
79	AWMS (Automated Warehouse Management System)	Barg-Walkow and Rogers (2016)
80	HS (Hospital Scheduling)	Chiou and Lee (2016); J. Li et al. (2020); León et al. (2021)
81	CCP (Collaborative Calendar Planning)	Kaelin et al. (2024)
82	SPO_Playbook (Single Pilot Operations Playbook)	Tokadlı et al. (2021)
83	CHATboard (Collaborative Human-Agent Taskboard)	Abuhaimed et al. (2023); Abuhaimed and Sen (2022b); Lavender et al. (2024); Abuhaimed and Sen (2022a, 2023); Lavender et al. (2023); Abuhaimed & Sen (2024)
84	ETP (Expert Travel Planner)	Bubb-Lewis and Scerbo (2002)
85	MRM (Multi-Robot Missions)	Al-Hussaini et al. (2024)
86	UxV_Planning (UxV Planning for Offshore and Onshore Surveillance)	Vered et al. (2020)
87	Mars_Sim (Cognitive Assistant for a Simulated Mars Mission)	Tokadlı and Dorneich (2022)
88	NeoCITIES	Schelble, Flathmann, Musick, et al. (2022); Schelble, Flathmann, McNeese, et al. (2022); McNeese, Schelble, et al. (2021)
89	RW4T (Rescue World for Teams)	Qian et al. (2024)
90	CHART (Computer-Human Allocation of Resources Testbed)	Doherty et al. (2023); Eloy et al. (2023); Bobko et al. (2023)
91	FCI_Sim (Fire, Crimes, and Injuries Simulation)	Graf et al. (2023)
92	DEFACTO (Demonstrating Effective Flexible Agent Coordination of Teams through Omnipresence)	Schurr et al. (2005)
93	BS (Blanket Search)	Schadd et al. (2022)
94	ACFP (Autonomous Constrained Flight Planner)	Brandt et al. (2018); Strybel et al. (2018)
95	RL (Rocket League)	Flathmann et al. (2024, 2023); Zhang et al. (2024)
96	Hanabi	Attig et al. (2024); Siu et al. (2021); Sidji et al. (2023)
97	Chess (Chess puzzle)	Zhang et al. (2023a)
98	Checkmate	Alarcon et al. (2024)
99	Tetris	Kulms and Kopp (2019)
100	SI (Space Invader)	Candon et al. (2022); Large et al. (2020)
101	GoT (Game of Trust)	Hafızoğlu and Sen (2018); Hafizoglu and Sen (2018, 2020); Hafizoğlu and Sen (2019)
102	BIE (Bacterial Infestation Estimation)	Fahnenstich et al. (2024)
103	GenAI_CoDesign	Han et al. (2024)

Testbed Coding Based on Team Classification Taxonomy

Using our taxonomy (Table 1), we reviewed and coded all 103 testbeds used in 235 empirical studies. As our focus in this paper is on analyzing and classifying the testbed, we thoroughly recorded and scrutinized the information related to the testbeds.

Furthermore, to enhance our understanding of how the testbeds have been implemented and utilized, we recorded additional information about each testbed beyond the ten team classification attributes: the domain to which the testbed belongs and its main task goal, the accessibility of the testbed, whether the Wizard of Oz (WoZ) technique (Dahlbäck et al., 1993) was employed to implement agent roles, and the types of metrics recorded to analyze team outcomes.

In terms of the testbed accessibility, several testbeds explicitly referred to publicly available applications (e.g., Overcooked game), or publicly shared their testbeds on a developer platform such as GitHub. Since this information provides useful insight into how the testbeds were developed and how future studies might build on them, we recorded the testbed accessibility details as reported in each paper.

The information on WoZ relates to how agent intelligence was implemented. The use of the WoZ technique implies that a human experimenter operated in place of the agent while deceiving participants into believing they were interacting with autonomous virtual teammates. If any empirical study using a given testbed specified the use of WoZ, we noted this in our records for that testbed.

Finally, the data recorded and generated by each testbed serve as key evaluation metrics for assessing HATs. Accordingly, we documented the types of dependent variables related to team outcomes, such as performance, behavior, and communication, based on the results reported in the original papers.

During the coding process, the researchers carefully examined the explicit explanations and illustrations provided in the papers. We tried to avoid any inference about the capabilities and underlying algorithms of computer-programmed agents. Instead, we focused on documenting the specific tasks assigned to human participants and agents, as well as their actions during the experiments. In cases when the team composition attribute was used as an independent variable (i.e., comparing the team performance of all-human teams to teams with artificial agents, Fan et al., 2005), we only considered and recorded conditions where at least one agent was present. Also, it should be noted that the coding was conducted at the experimental session level, based on the specific team configuration used in each session. In other words, when a single testbed was used across multiple categories within an attribute across different experimental sessions (e.g., in one experimental session, there were one human and one agent, while in another session using the same testbed, there were one human and two agents), we double-coded the testbed into all applicable categories (in this example, both 1-1 and 1-M).

Two researchers, HC and TH, first carefully reviewed n = 8 randomly selected research records and independently coded m = 4 testbeds used in the eight studies. After that, all authors held sessions to discuss and calibrate the coding logic. Any confusion in the coding scheme was resolved. After that, HC and TH continued to code another 21 testbeds. Inter-rater reliability for the first 25 testbeds reached 91%. In cases where discrepancies arose, all authors convened to share their perspectives and reach a consensus. Subsequently, HC independently coded the remaining testbeds. Finally, descriptive analyses were conducted to examine the coding results for each attribute in the team classification taxonomy.

Results

We identified a total of 103 different testbeds from the 235 empirical research studies analyzed (see Table 2). It is worth noting that some testbeds were utilized in multiple research studies, with CERTT (testbed #2) (n = 33) and MIX (testbed #1) (n = 15) being the two most frequently employed. Most of the other testbeds appeared in only one or two research studies. The frequency analysis of the testbeds is visualized in Figure 2.

Figure 2.

Overall testbed frequency (only testbeds with frequency ≥2 are presented (n = 47)).

For each testbed, we recorded its task goal and overall interaction scenario, which provides insights into the specific focus of the testbeds (see Appendix A). Notably, a significant proportion of the testbeds are situated within military-related fields. Consequently, many of the testbeds’ task goals are associated with target search and elimination, navigation, and route planning. For instance, in the MIX (testbed #1), human participants are tasked with identifying targets by monitoring and supervising subordinate unmanned vehicles. In another testbed, TANDEM (testbed #14), the goal of the team is to closely communicate with each other to gather information about targets and make proper decisions on whether to engage or clear each target.

Another significant portion of the testbeds focuses on search and rescue tasks. For example, USAR_Vict:a (testbed #34) and USAR_Vict:b (testbed #37) both feature a human–agent dyad rescue team collaborating to perform urban search and rescue tasks. The first testbed’s shared task involves reporting the number of casualties in as many rooms as possible, while the latter focuses on exploring buildings damaged by an earthquake and extracting victims from the disaster area as quickly as possible.

There are also some testbeds that offer relatively unique tasks. For instance, in the BW4T (testbed #61), human participants are required to deliver a sequence of blocks in a specific order by cooperating with agent teammates. CHAOPT (testbed #63) simulates a kitchen environment where a human and an agent collaborate to prepare dishes. CHATboard (testbed #83) focuses on task allocation dynamics. In this testbed, a human participant selects from a list of four subtasks (identifying language, identifying a landmark, solving a word grid, and identifying an event) and continues making selections throughout the interaction, indicating which tasks they want their agent teammate to perform on their behalf. The selection happens within the context that the task set must be effectively completed by both team members. In addition, the GoT (testbed #101) involves two players collectively completing a team task of audio transcription.

Team Composition

Regarding the team composition attribute, the analysis reveals that among the 103 testbeds, 56.3% (n = 58) of them are designed as environments where a participant interacts with one agent, referred to as one-to-one (1–1) configuration. The second most prevalent type is one-to-many (1-M), where a single human participant works on a task with multiple agents. For example, in the UxV_Sim (testbed #3), a human operator is responsible for identifying and attacking targets using unmanned vehicles, with the assistance of an autonomous planner that provides scheduling recommendations to the human.

There are also cases where multiple human operators are involved, leading to configurations such as many-to-one (M-1) and many-to-many (M-M). For example, in many studies utilizing the CERTT (testbed #2), two human participants serve as a navigator and a photographer, respectively, collaborating closely with an agent functioning as a pilot. The team comprises three members, exhibiting a many-to-one (M-1) team composition, and they collaborate to take photographs of ground targets.

Notably, there are relatively few testbeds (n = 8) classified under the many-to-many (M-M) configuration. For instance, in TDT (testbed #18), two human players, each supported by a decision-support agent embedded in their screens, engage in a target elimination task. Several testbeds feature a relatively bigger team size—with VSCS (testbed #20) consisting of two humans controlling and managing up to 16 UAVs and Combat_Sim (testbed #23) allowing one human commander, six human crew members, and two robotic combat vehicles to complete a mission of navigating terrains and combating enemies. HyForm (testbed #73) mimics teams within a company specializing in package and food delivery using drones. In one study utilizing the testbed, six humans were assigned to one of three teams, each responsible for drone design, operations planning, and business management. There were two agents, one supporting the drone design team and the other the operations planning team (Song et al., 2022). The distribution of testbeds based on team composition attribute can be found in Table 3.

Table 3.

Team Composition.

Team Composition	Testbed Index	Count
One-to-one (1–1)	7, 9, 16, 19, 21, 22, 24, 26, 29, 32, 33, 34, 35, 36, 37, 39, 42, 45, 46, 47, 48, 50, 51, 52, 55, 56, 58, 59, 60, 62, 63, 64, 65, 68, 71, 72, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 87, 89, 94, 95, 96, 97, 98, 99, 100, 101, 102	58
One-to-many (1-M)	1, 2, 3, 5, 6, 8, 10, 11, 12, 13, 15, 17, 27, 28, 30, 31, 37, 39, 40, 44, 49, 54, 57, 61, 66, 67, 69, 70, 85, 86, 88*, 92, 93	33
Many-to-one (M-1)	2, 4, 13, 14, 25, 40, 43, 53, 67, 70, 88, 90, 91, 103	14
Many-to-many (M-M)	18, 20, 23, 31, 38, 41, 67, 73	8

Note. Nine testbeds (testbeds #2, 13, 31, 37, 39, 40, 67, 70, and 88) were classified into multiple categories, as they supported various configurations depending on the experimental conditions.

Task Interdependence

In terms of task interdependence, approximately 41% of the testbeds (n = 42) are classified as intensive. This indicates a high level of interdependence among team members, where their actions and performance significantly rely on each other. The second most common type is reciprocal, which closely corresponds to the one-to-one (1–1) team composition. In this type, a high level of interdependence exists; however, since there are only two members, the interaction between team members occurs exclusively on an individual-to-individual basis, rather than involving multiple members simultaneously.

Twenty testbeds are classified as sequential, where team members act in a sequential manner, with one team member’s action depending on the completion of a previous team member’s action. One example testbed is the Fcty_Sim (testbed #62), where a human and an agent collaborate to assemble carts. In this testbed, the human’s role is to carry the necessary parts for assembly and position them on the workbench. The agent, in turn, fetches the correct tool and applies it to the configuration in a sequential manner.

Notably, only six testbeds are classified as pooled. One of them is the ATDS (testbed #22), where a human and an agent classify vehicles across different locations to identify potential threats, with each responsible for a separate set of locations and thus independently contributing to the team outcome. In another example, the GoT (testbed #101), where two players independently complete team tasks (i.e., audio transcription), and they have to do it without directly communicating with each other. The testbed is specifically designed to create a situation where implicit trust between the human participant and the agent plays a pivotal role in their teaming dynamics. The distribution of testbeds based on the task interdependence attribute can be found in Table 4.

Table 4.

Task Interdependence.

Task Interdependence	Testbed Index	Count
Pooled	22, 35, 47*, 57, 83, 101	6
Sequential	14, 19, 26, 38*, 40, 54, 60, 62, 63, 64, 65, 66, 68, 76, 81, 89, 96, 97, 99, 102	20
Reciprocal	7, 9, 16, 21, 24, 29, 32, 33, 34, 36, 37, 39, 42, 45, 46, 47*, 48, 50, 51, 52, 55, 56, 58, 59, 71, 72, 74, 75, 77, 78, 79, 80, 82, 84, 87, 94, 95, 98, 100	39
Intensive	1, 2, 3, 4, 5, 6, 8, 10, 11, 12, 13, 15, 17, 18, 20, 23, 25, 27, 28, 30, 31, 37, 38, 39*, 41, 43, 44, 49, 53, 61, 67, 69, 70, 73, 85, 86, 88, 90, 91, 92, 93, 103	42

Note. Four testbeds (testbeds #37, 38, 39, and 47) were classified into multiple categories, as they supported various configurations depending on the experimental conditions.

Role Structure

Concerning role structure, a majority of testbeds (n = 79) are classified as functional, where each member has distinct roles that are not interchangeable. For instance, in the CERTT (testbed #2), the two human participants and an agent fulfill different roles as a navigator, photographer, and pilot, respectively, and these roles are not interchangeable. On the other hand, the divisional classification is characterized by humans and agents performing identical tasks to achieve a common goal. For example, in the HS (testbed #80), both a human and an agent act as hospital managers, responsible for coordinating staff resources to maximize patient treatment. The frequency table for the role structure attribute can be found in Table 5.

Table 5.

Role Structure.

Role Structure	Testbed Index	Count
Functional	1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 23, 24, 25, 26, 28, 29, 30, 31, 32, 36, 37, 38, 39, 41, 42, 43, 45, 46, 47, 48, 49, 51, 53, 54, 55, 57, 58, 59, 62, 63, 65, 66, 68, 70, 72, 73, 74, 75, 76, 79, 81, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 97, 98, 99, 102, 103	79
Divisional	17, 22, 27, 33, 34, 35, 38, 40, 44, 47, 50, 52, 56, 60, 61, 64, 67, 69, 71, 77, 78, 80, 82, 96, 100, 101	26

Note. Two testbeds (testbeds #38 and 47) were classified into multiple categories, as they supported various configurations depending on the experimental conditions.

Leadership Structure and Leadership Role Assignment

Regarding leadership structure and leadership role assignment, we coded based on the following definition of a leader: a team member who is authorized to lead the team and make final decisions in a situation of disagreement. It is worth noting that in 30 out of the 103 testbeds, there are no explicit leaders in a team (i.e., none). Among the other 73 testbeds, 61 are classified as designated. In the majority of these testbeds (n = 59), humans assume the leader role (leadership role assignment: human). Only two testbeds were used to include a condition where the agent could also serve as a leader, depending on the experimental setup (leadership role assignment: human/agent). For instance, in the TBC (testbed #13), which involves three members each assigned to the “intelligent,” “operations,” and “logistics” cell roles, the individual responsible for the intelligent cell role could be considered the leader. This role has the authority to determine whether the approaching object is a neutral force or an enemy unit. Depending on the experimental condition, either a human participant or the R-CAST (agent) could perform the intelligent cell role.

Notably, one testbed, CT (testbed #67), was identified as exhibiting a temporary leadership structure, and the leadership role assignment could be either a human or an agent (human/agent). In this testbed, six participants were grouped together to complete multiple rounds of delivering packages, with some members’ identities openly shared as humans while others were presented as agents. Participants freely formed groups of two or three by negotiating. Any team member could voluntarily initiate group formation and propose specific others to collaborate with for each round. As a result, leadership rotated among team members during the task.

Finally, there are twelve testbeds coded as distributed leadership type, and the leader roles are mostly fulfilled by and distributed to humans (leadership role assignment: human). In MokSAF (testbed #53), for instance, three human members, assigned as commanders, collaboratively engage in route planning. They evaluate their plans from a team perspective and iteratively modify them until reaching an acceptable solution agreed upon by the team. In contrast, there were four testbeds where leadership roles were distributed between the human and the agent (leadership role assignment: human/agent). For example, in Netrek (testbed #28), one human and one agent both assumed the “enforcer” roles, proactively taking actions to eliminate enemies and protect their planets by utilizing multiple subordinate autonomous bots. The frequency table for leadership structure and leadership role assignment can be found in Table 6.

Table 6.

Leadership Structure & Leadership Role Assignment.

Leadership Structure	Count	Leadership Role Assignment	Testbed Index	Count
External manager	0	Human	-	0
		Agent	-	0
		Human/Agent	-	0
Designated	61	Human	1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 18, 19, 22, 23, 24, 26, 29, 30, 31* 32, 36, 39, 42, 45, 46, 47, 48, 49, 51, 54, 55, 56, 58, 61, 65, 66, 68, 72, 73, 74, 75, 76, 79, 83, 84, 85, 86, 87, 89, 90, 92, 94, 97, 99, 102	59
		Agent	-	0
		Human/Agent	13, 21	2
Temporary	1	Human	-	0
		Agent	-	0
		Human/Agent	67	1
Distributed	12	Human	20, 25, 31*, 38, 41, 53, 91, 103	8
		Agent	-	0
		Human/Agent	28, 59, 81, 82	4
None	30	None	16, 17, 27, 33, 34, 35, 37, 40, 43, 44, 50, 52, 57, 60, 62, 63, 64, 69, 70, 71, 77, 78, 80, 88, 93, 95, 96, 98, 100, 101	30

Note. One testbed (testbeds #31) was classified into multiple categories, as it supported various configurations depending on the experimental conditions.

Communication Structure, Direction, and Medium

In terms of communication structure, the most common type is the dyadic structure, as all 58 testbeds with the team composition configuration of 1–1 inherently represent dyadic communication between the two members. Of these 58 testbeds, the majority (n = 45) feature bidirectional communication between the human and the agent. In contrast, fourteen testbeds featured unidirectional communication between them. For example, in SN (testbed #51), the agent provides task assistance to the human by generating trajectories, while the human does not provide any feedback to the agent.

Twenty-two testbeds are coded as having a hub-and-wheel communication structure. Within this category, both unidirectional and bidirectional communication between humans and agents were observed. For example, in MITPAS (testbed #12) and Combat_Sim (testbed #23), the human provides unidirectional commands to the agents, which they follow without offering any feedback. In contrast, several other testbeds (e.g., testbeds #1, 3, 11, 15, 54, 85, and 86) involve multiple subordinate agents and one primary decision aid/task assistance agent. The primary agent assists the human, while all other subordinates receive and follow commands from the human, representing a bidirectional exchange of information between the human and the agents. One testbed, WS (testbed #66), demonstrates relatively unique communication dynamics within the hub-and-wheel structure, also featuring bidirectional communication. In this testbed, a human acts as the hub and supports two agents, each requesting help to search for and collect specific items from a virtual supermarket.

Lastly, the remaining testbeds are classified as the star communication structure (n = 26). No testbeds are classified as having a chain structure. Table 7 presents the distribution of the communication structure and direction between humans and agents.

Table 7.

Communication Structure and Communication Direction (H-A).

Communication Structure	Count	Communication Direction (H-A)	Testbed Index	Count
Dyadic	58	Unidirectional	7, 9, 24, 26, 45, 46*, 48, 51, 58, 68, 79, 89, 97, 102	14
Dyadic	58	Bidirectional	16, 19, 21, 22, 29, 32, 33, 34, 35, 36, 37, 39, 42, 46, 47, 50, 52, 55, 56, 59, 60, 62, 63, 64, 65, 71, 72, 74, 75, 76, 77, 78, 80, 81, 82, 83, 84, 87, 94, 95, 96, 98, 99, 100, 101	45
Hub-and-wheel	22	Unidirectional	5, 8, 10, 12, 23, 25, 30, 31, 38, 41, 49, 92	12
Hub-and-wheel	22	Bidirectional	1, 3, 6, 11, 15, 54, 66, 73, 85, 86	10
Chain	0	Unidirectional	-	0
Chain	0	Bidirectional	-	0
Star	26	Unidirectional	14, 18, 20, 53, 57	5
Star	26	Bidirectional	2, 4, 6, 13, 17, 27, 28, 39, 40, 43, 44, 61, 67, 69, 70, 73*, 88, 90, 91, 93, 103	21

Note. Four testbeds (testbeds #6, 39, 46, and 73) were classified into multiple categories, as they supported various configurations depending on the experimental conditions.

In Table 7, we present the classification of communication direction between humans and agents, as this applies to all testbeds, given that every HAT includes communication between the two. For communication direction between humans and between agents, we examined only the testbeds that are not one-to-one (1–1) configurations, as only those are applicable for such classifications.

For the communication direction between humans, we found that all applicable testbeds involved bidirectional communication among human members.

For communication direction among agents in testbeds with multiple agents, the classification is closely tied to the overall communication structure. In a hub-and-wheel configuration, where humans commonly serve as the communication hub and agents typically take subordinate roles, there is no explicit communication between agents. In contrast, in a star configuration, which allows free-flowing communication among all members, agents in most testbeds exhibit bidirectional communication and collaborative teaming relationships (e.g., BW4T, testbed #61).

Regarding the communication medium, approximately 47% of the testbeds (n = 48) were classified as using uni-modal communication, with the near majority primarily relying on UI control-based interactions (n = 47). One testbed (F&R, testbed #49) was classified as speech-based, where a human issued voice commands to supervise and direct multiple agents in gathering and removing target objects. The remaining testbeds were classified as multi-modal, incorporating additional communication methods alongside UI control-based. Many supported text-based messaging (e.g., testbeds #19, 53, 61), while others enabled voice chat (e.g., testbeds #18, 24, 77). A few supported both text and speech modalities (e.g., testbeds #81, 82). Additionally, VSCS (testbed #20), in two of the studies, included shared gaze interfaces, while the two humans could also communicate verbally, thereby enabling another type of multi-modal communication. Table 8 presents the distribution of the communication medium.

Table 8.

Communication Medium.

Communication Medium	Testbed Index	Count
UI control-based	6, 7, 9, 10, 12, 14, 17, 21, 25, 26, 27, 29, 30, 32, 33, 35, 40, 45, 51, 56, 57, 59, 62, 63, 65, 67, 68, 71, 74, 75, 76, 79, 80, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102	47
Speech-based	49	1
Multi-modal	1, 2, 3, 4, 5, 8, 11, 13, 15, 16, 18, 19, 20, 22, 23, 24, 28, 31, 34, 36, 37, 38, 39, 41, 42, 43, 44, 46, 47, 48, 50, 52, 53, 54, 55, 58, 60, 61, 64, 66, 69, 70, 72, 73, 77, 78, 81, 82, 83, 84, 85, 86, 87, 88, 103	55

Physical Distribution

For the physical distribution attribute, nearly equal numbers of testbeds were classified as colocated (n = 43) or distributed (n = 47). The colocated tasks include relatively simple threat assessment and identification tasks (e.g., TA, testbed #19) or route planning tasks (e.g., SN, testbed #51) supported by a locally available decision aid, assembly (e.g., Fcty_Sim, testbed #62), food cooking (e.g., Overcooked, testbed #64), co-delivery of objects (e.g., MB, testbed #71), scheduling and planning (e.g., ETP, testbed #84), and playing card or puzzle games (e.g., Chess, testbed #97).

On the other hand, distributed testbeds centered around teleoperating and controlling UAVs (e.g., AST, testbed #4), collaborating to identify targets by dividing roles such as reconnaissance and standby (e.g., TAISR, testbed #26), urban search and rescue tasks with teammates responsible for different locations (e.g., IIHAT, testbed #40), and emergency response (e.g., NeoCITIES, testbed #88).

Lastly, several testbeds exhibit a mixed configuration, with different patterns of team subsets being either colocated or distributed. Identified patterns include cases where all humans are colocated, as well as cases where a human is colocated with an agent, while the remaining team members are distributed. For instance, there are testbeds where a human and a decision aid agent, being colocated, supervise and monitor multiple UAVs that are distant and distributed (e.g., RESCHU, testbed #11). In other testbeds, multiple humans, being colocated, control agents that are physically distributed (e.g., USAR_MrCS, testbed #41). Additionally, one testbed, GenAI_CoDesign (testbed #103), interestingly designed a scenario where two humans, being distributed, are connected via video call and collaborate with generative AI tools on one of their laptops, thus featuring that person and agent as being colocated. Table 9 presents the distribution of the physical distribution.

Table 9.

Physical Distribution.

Physical Distribution	Testbed Index	Count
Colocated	7, 9, 14, 19, 21, 24, 29, 33, 36, 37, 45, 51, 52, 53, 54, 56, 59, 60, 61, 62, 63, 64, 65, 67, 68, 69, 71, 72, 74, 76, 77, 79, 81, 84, 87, 94, 95, 96, 97, 99, 100, 101, 102	43
Distributed	2, 4, 5, 6, 8, 10, 12, 13, 16, 17, 18, 22, 25, 26, 27, 28, 30, 32, 34, 35, 38, 39, 40, 42, 43, 44, 46, 47, 48, 50, 55, 58, 66, 70, 73, 75, 78, 80, 82, 83, 88, 89, 90, 91, 92, 93, 98	47
Mixed	1, 3, 11, 15, 20, 23, 31, 41, 49, 57, 85, 86, 103	13

Team Life Span

Concerning the team life span attribute, nearly all of the testbeds are classified as long-term. This means that, in most testbeds, team characteristics remain consistent throughout the entire duration of the experimental session.

Notably, three testbeds featured ad hoc teams. In TSF (testbed #21), a human and an agent were randomly assigned to one of two roles: either bait (responsible for attracting the fortress’s attention) or shooter (responsible for attacking and destroying the fortress). The two members underwent dynamic role changes. In CT (testbed #67), participants were allowed to freely form groups of two or three to deliver packages based on their evaluations of teammates. After completing the delivery of a single package, the team was disbanded, and participants selected new teammates or decided to be on their own for the next delivery. Lastly, in CHATboard (testbed #83), task allocation was also temporary and dynamic. After each round, the human participant decided which and how many tasks the agent should perform, based on their cumulative evaluations and perceptions of the agent’s performance.

Testbed Accessibility, Wizard-of-Oz (WoZ), and Agent Implementation

Among the 103 testbeds analyzed in this study, seventeen were made publicly accessible through open links (testbeds #1, 17, 29, 44, 45, 53, 56, 61, 63, 64, 66, 67, 72, 76, 95, 96, 103). These accessible testbeds and their corresponding links are listed in Appendix A.

Regarding the agent implementation, nineteen testbeds were identified in which at least one associated paper explicitly mentioned that the experiment employed the Wizard of Oz technique to implement agent roles (testbeds #2, 16, 28, 36, 37, 39, 40, 55, 58, 63, 64, 67, 70, 72, 78, 84, 86, 88, 98). In these cases, a deception study was used to ensure that participants believed they were interacting with a virtual agent, although the interaction was actually mediated by a human operator.

Additionally, all the identified testbeds have been actively used to analyze team outcomes. While the names of the measures are often highly task-specific (e.g., “proportion of correctly identified victims” and “path trajectory similarity”), they can be broadly categorized into performance-, human/agent behavior-, and communication-related metrics.

For performance, common measures included task completion time, accuracy, scores, and error rates. For behavior, human-related measures focused on reliance on and compliance with agents, as well as task delegation strategies. Agent-related measures typically included actual task performance and reliability. Regarding communication, many testbeds measured total communication frequency. Some studies further analyzed time spent communicating with team members, the differentiated amounts of information pulled versus pushed, and the length of communication messages. In many cases involving text-based communication, raw communication logs were saved to support both quantitative and qualitative analyses.

However, it should be noted that the measures reported in the papers may not fully reflect all metrics available from the testbeds, as the dependent variables reported are often closely tied to the specific research questions of each study.

Discussion

Studies on HATs have utilized a variety of testbeds, each uniquely developed and employed by different research teams and featuring distinctive characteristics. We review testbeds used in prior research and identify significant gaps in the current design. It is important to note that while our analysis is testbed-oriented, the design of testbeds and the goals of HAT studies are closely coupled. Therefore, our analysis will also inform future research directions.

Existing Testbeds and Research, and Future Directions

In terms of team composition, most testbeds involve only one human participant, with the majority of them having one agent (1–1, Table 3). Among the testbeds featuring multiple agents (1-M or M-M), many of the agents function as subordinate autonomous vehicles with minimal engagement, performing assigned tasks without actively interacting with humans (e.g., Chien et al., 2016; Mercado et al., 2016). Therefore, future research could focus on developing and exploring more diverse types of agents. Additionally, the existing testbeds lack configuration of multiple humans and multiple agents working together (M-M), there are only eight (testbeds #18, 20, 23, 31, 38, 41, 67, 73). Still, in these testbeds, the multiple agents are mostly limited to being decision-support displays (testbed #18) and subordinate vehicles (testbeds #20, 23, 31, 38).

Although implementing multiple agents and enabling real-time, seamless connectivity between multiple humans and agents present significant challenges, future research should aim to develop M-M testbeds and examine complex teaming scenarios involving multiple agents with advanced capabilities and functionalities (Karwowski et al., 2025; Wildman et al., 2024). This direction offers two clear benefits: (1) it allows HAT research to more accurately reflect real-world systems without oversimplification, thereby enhancing applicability and (2) it addresses critical research questions that inherently require such configurations. For instance, it enables the examination of inter-team collaboration within organizations. As demonstrated by HyForm (testbed #73), an M-M configuration supports experimentation of the adoption of autonomous technologies across various sectors within a company. Additionally, this setup can help us understand how multiple humans collectively develop strategies for effectively utilizing multiple autonomous agents. USAR_MrCS (testbed #41), which allowed two participants to work with multiple search and rescue robots operating based on an autonomous path planner, serves as a good example.

Regarding task interdependence, the majority of testbeds exhibit intensive type when the team size is three or greater, and reciprocal type when the team size is two (Table 4). Around 19% of the testbeds were classified as sequential, primarily due to the turn-taking nature of the tasks, such as in assembly line settings. Another contributing factor was the limited autonomy of the agents, which often acted as decision aids that provided recommendations either before or in response to human decision making, resulting in a clear sequential order in the interactions. Still, the predominance of intensive and reciprocal configurations suggests that the overall HAT research landscape is oriented toward and interested in exploring higher degrees of interdependence in human–agent interactions.

Concerning role structure, functional type is more common than divisional (Table 5), which implies that team members are assigned different tasks that are not interchangeable. This aligns with the earlier observation that agents in some of the testbeds possess lower levels of autonomy. As a result of these limitations, humans assume a more pivotal role, while the agents take on subordinate roles, executing commands issued by the humans.

For leadership structure and leadership role assignment, designated or none leadership structures are most prevalent, with a human participant typically assuming the leader role within teams (Table 6). Once again, the predominance of human leadership role assignments reflects the limited autonomy of agents in many testbeds.

Despite the real-world advancements in highly autonomous technologies, including generative AI, many agents in current testbeds remain limited to roles that involve receiving commands from humans and operating in a highly structured and rule-based manner. To enable agents to assume a leadership role, they must demonstrate significantly higher levels of intelligence, capable of independently planning, issuing appropriate commands to humans, and interpreting human intentions to provide contextually relevant guidance that enhances task efficiency.

With the implementation of such advanced intelligent agents, we envision that a broader range of leadership structures could be explored to study team dynamics and performance. Investigating how team dynamics and outcomes are influenced when an agent assumes the leader role, or when humans and agents collaborate as co-leaders, could yield valuable insights. Furthermore, introducing novel leadership structures into larger teams would facilitate complex team dynamics, enabling a deeper exploration of complex hierarchies and power dynamics among members.

The analysis of communication structures reveals that dyadic, hub-and-wheel, and star configurations are commonly observed in the testbeds, while the chain structure is absent (Table 7). The prevalence of the hub-and-wheel structure closely reflects the abundance of military-domain tasks, where a human, serving as the “hub,” performs target identification/reconnaissance/surveillance tasks by controlling multiple autonomous agents. The star structure represents cases where multiple teammates collaborate on assembly/transportation/manufacturing-related or search and rescue tasks, which necessitate free-flowing communication between all pairs of teammates.

The absence of the chain structure can be attributed to the relative lack of testbeds simulating large-scale HATs or even organizations, for instance, human–agent teaming in a multi-echelon network. Multi-echelon networks are commonly found in large-scale organizations, where nodes (human or agent) are hierarchically or sequentially arranged across layers (echelons), and direct interactions typically occur only between adjacent layers. A representative example is an emergency response system: first aid team members interact with unmanned aerial or ground vehicles for search and rescue operations, report to their team leader, who in turn reports to the incident commander. In nearly all reviewed testbeds, group size was limited to a maximum of six members. With such a small-scale setting, unrestricted communication among all members is feasible and perhaps more effective. However, as the size of the team or organization increases, more complex organizational forms, such as multi-echelon networks, are likely to emerge, in which the chain structure becomes more relevant and applicable.

Considering the results in conjunction with team composition, there is potential for further exploration of communication dynamics, particularly as the team size increases. As various communication strategies can emerge within larger teams (Butchibabu et al., 2016), the M-M configuration could enable a discovery of new forms of communication dynamics in HATs.

For example, the hub-and-wheel type can be further explored with more complex team configurations. Currently, in all testbeds that are classified as hub-and-wheel, the majority of agents are subordinate autonomous vehicles that simply follow a human’s commands and controls. A new type of hub-and-wheel structure can be evaluated, such as a team consisting of multiple humans and agents (M-M), where one human serves as the hub connecting with multiple human and agent teammates at the same hierarchical level. This setup would allow researchers to investigate how the human hub interacts with other humans and agents. One testbed (Combat_Sim, testbed #23) characterizes a similar type of team, consisting of one human commander and six human crew members. However, the two agents in this setup are robotic combat vehicles controlled by the human crews. Accordingly, the setup does not allow for the exploration of how the human commander differentially interacts with and assigns tasks to human and agent team members.

Exploring the chain structure by exploring inter-team communication is also a promising avenue. For instance, by employing a setup similar to the HyForm (testbed #73), an organization may be structured across three levels, such as supply ordering and supply chain management, the production line, and sales and business operations. While free-flow communication across all sectors offers certain advantages, it can also result in inefficiencies and delays, particularly when decisions require consensus among all parties. In such cases, a chain communication structure may be more effective. The success of this structure often hinges on who occupies the central position in the communication chain and how effectively they manage the flow of information. Investigating this dynamic in teams that include agent teammates presents a novel direction.

Regarding communication direction among humans and among agents, relatively straightforward patterns were observed. Communication between human pairs was bidirectional, whereas communication between agent pairs was either bidirectional or not explicitly defined. However, it is important to note that these classifications may become significantly more complex with larger team sizes. As team size increases, the communication patterns among each possible human/agent pair can be highly variant.

In terms of communication medium, most testbeds employed relatively traditional forms, such as text or voice message exchange (Table 8). A wider range of channels and methods could be developed and explored. For example, incorporating haptic interfaces or image processing technologies would allow humans to convey information through facial expressions or gestures. Such advancements would enhance the richness and versatility of communication within HATs.

For physical distribution, the relatively small proportion of mixed configuration compared to colocated once again reflects the relatively small team sizes featured in the testbeds (Table 9). As team sizes increase, it becomes infeasible for all members to remain colocated. In either distributed or mixed configurations, research on effective communication strategies that facilitate teamwork and coordination will become increasingly important.

Concerning team life span, most current studies focus on evaluating performance and coordination within a single, fixed team configuration; all but three testbeds in this review are classified as long-term. This points to a critical gap in understanding how HATs operate in more dynamic contexts, where both human and agent teammates may be reassigned or self-select into new ad hoc teams. Emerging work has begun to draw on organizational models such as the “team of teams” framework (McChrystal et al., 2015), which emphasizes fluid HAT composition and the continual reconfiguration of members as problem contexts evolve (Guo et al., 2023a, 2024). To advance this line of research, it would be valuable to examine why and how HATs evolve over time, particularly as a function of task demands, individual preferences, or other contextual factors. Longitudinal studies that extend beyond a single session would provide the temporal resolution necessary to investigate such dynamics.

In summary, our study highlights the need for further exploration of team attributes across all dimensions addressed in this research. The classification results presented in this paper can serve as a valuable reference for HAT researchers in identifying the appropriate types of team characteristics to incorporate into their testbeds. Testbeds that implemented relatively complex and advanced agent functionalities or enabled less common configurations such as M-M setups can serve as strong benchmarks for future work. Additionally, as an alternative to developing multiple high-autonomy agents, the use of the WoZ technique can help address specific research objectives and facilitate early-stage investigations (Dahlbäck et al., 1993).

Designing Future Testbeds

The previous section presents the need to develop new testbeds to enable the examination of more sophisticated teaming scenarios. Even though identifying characteristics of an “ideal” next-generation HAT testbed is out of the scope of this literature review, we do note some important characteristics: flexibility, open-source accessibility, and modularity.

Advocating for Flexible Testbeds

In our classification, we double-coded testbeds into all applicable categories and noted those assigned to multiple classifications in Tables 3 –9. These testbeds can be considered flexible testbeds, offering researchers the freedom to design and explore different types of team configurations. Such flexibility enables the use of adaptable features as key independent variables to examine their effects on team performance, teammate satisfaction, and other outcomes.

However, there are relatively few double-coded examples. This aligns with the observation that much of the existing HAT literature has focused on comparing the performance of all-human teams to that of HATs (e.g., Fan et al., 2005; Harriott et al., 2012), with limited attention given to examining how specific team characteristics influence team outcomes.

To address this gap, we advocate for the design of testbeds that allow for flexible manipulation of team characteristics. Alternatively, existing testbeds could be extended to support such variability. Both approaches would open up valuable research opportunities to systematically investigate team dynamics across a wide range of configurations within a consistent experimental framework.

Testbeds that were double-coded in Tables 3 –9 can serve as good examples. For instance, TRESCHU (testbed #38) offered flexibility in both task interdependence and role structure. In one configuration, all autonomous vehicles of a given type were assigned to a single human operator, who then became fully responsible for tasks requiring that vehicle type. This created a sequential task dependency, where team members had to wait for the operator’s completion, and established a functional role structure. In an alternative configuration, each operator was assigned one vehicle of each type, enabling any operator to complete any task as needed. This arrangement resulted in intensive task interdependence between all humans and autonomous agents and a divisional role structure. Likewise, testbeds can be developed to allow experimenters to freely manipulate specific roles, access permissions to team resources, and other attributes for each team member during the configuration setup phase.

Advocating for Designing Open-Source Testbeds

In the field of HATs, it is common for research teams to develop their own testbeds tailored to the specific needs of their studies. Unfortunately, many testbeds remain closed or are shared with only a limited group of researchers. This lack of accessibility restricts the broader research community’s ability to replicate findings, build on prior work, and accelerate innovation. It also creates inefficiencies, as other researchers may need to reinvent the wheel.

A great open-source example, although not considered as an HAT testbed, is the Multi-Attribute Task Battery (MATB) (Comstock & Arnegard, 1992). MATB is a widely used tool for assessing human multi-tasking performance. Originally developed by NASA, the MATB has been continuously improved over the years and remains easily accessible to researchers. Notably, the open-source version, OpenMATB (Cegarra et al., 2020), became available recently, allowing researchers to customize and utilize the testbed freely.

Another great example is the BW4T testbed (Johnson et al., 2009), developed at the Delft University of Technology (TU Delft) as an open-source testbed from the beginning. The testbed has been used by many researchers to study team coordination among human–human, agent–agent, and human–agent teams (e.g., Harbers, Bradshaw, Johnson, Feltovich, van den Bosch, & Meyer, 2011, Harbers, Bradshaw, Johnson, Feltovich, Van Den Bosch, & Meyer, 2011; Butchibabu et al., 2016).

A major hurdle of using open-source testbeds is that they may not possess enough flexibility or customization options, and therefore, cannot be modified to suit the needs of specific research. For example, an open-source testbed may have predefined tasks, environments, or agent behaviors that cannot be easily modified to suit a particular study’s needs. To address this challenge, one potential solution is to use modular design (Schilling, 2000). Modular design is a design theory that subdivides a system into smaller elements (i.e., modules), which can be independently created, modified, replaced, or exchanged between different systems. A testbed can be structured into different modules, for example, a task module that defines the task scenarios or activities, an agent module that defines agent behaviors including underlining algorithms, a graphic interface module that manages the interface between human and autonomous agents, a communication module that manages communication channels and medium, and a performance measure module that tracks various dependent variables of interest.

In addition, we acknowledge that making a testbed publicly available is not always feasible, especially when its development is closely tied to specific hardware, proprietary datasets, or confidential project agreements. In such cases, it remains essential for researchers to provide clear and transparent descriptions of the teaming scenarios involved. It would facilitate meaningful comparisons with other studies employing similar team configurations, even when different testbeds are used. The taxonomy proposed in this work can serve as a useful reference for researchers when outlining the team characteristics of their testbeds.

Practical Applications of the HAT Taxonomy

Beyond organizing the existing literature, the proposed taxonomy can serve as a practical framework for guiding future research in HATs. Researchers and designers can use the taxonomy in multiple ways. First, for testbed development, the taxonomy can help identify which attributes to incorporate or modify when designing new testbeds or refining existing ones. Second, in experimental design, the taxonomy can support the ideation of new HAT research questions by highlighting which team characteristics are worth comparing, helping researchers formulate hypotheses and design their studies accordingly. Third, the taxonomy can be used as a standardized reporting guideline when describing testbeds in future publications, allowing researchers to clearly position their work within the broader landscape. This reduces ambiguity about key team attributes and ensures that other researchers can easily interpret the scope and capabilities of a given testbed. Lastly, the taxonomy can serve as a foundation for meta-analyses, enabling researchers to systematically examine how specific sets of team attributes relate to HAT outcomes. Overall, the taxonomy not only supports the synthesis of past work but also serves as a practical tool throughout the entire HAT research process, from testbed development and study design to positioning the study within the broader literature.

Limitations and Future Research

The study should be viewed in light of the following limitations. First, our work focused exclusively on virtual agents, excluding physically embodied agents. A key advantage of software-based testbeds is their greater flexibility in modeling diverse interaction and collaboration scenarios without hardware or deployment constraints. Additional reviews could be conducted to incorporate embodied agents.

Second, the proposed team classification taxonomy and the team attributes analyzed in this review are primarily centered on the characteristics of the team task itself. While this study adapts the well-established human team taxonomy by Wildman et al. (2024), we make no claim that our taxonomy is exhaustive. Future work could incorporate additional dimensions to capture other critical aspects of HAT. For instance, dynamic team processes, which reflect how team coordination and structure evolve over time, could be an important dimension, especially as the nature of HATs becomes more advanced and supports prolonged, complex cooperation. Additionally, although the review briefly discusses team evaluation metrics as reported in prior HAT research, a more in-depth synthesis and analysis of HAT measurement approaches would be a valuable direction for future review work.

Third, the scope of the review was to examine the distribution of existing testbeds in terms of team characteristics, rather than to conduct a meta-analysis of the results to address questions such as which team compositions lead to optimal performance, higher trust, or greater satisfaction. By leveraging the findings from this review, future research can refine research questions, identify specific team attributes of interest, and focus on synthesizing relevant results from existing studies. This approach would facilitate a comparative evaluation of different categories within each attribute, helping to identify trends in the findings.

Fourth, the field of HAT research is rapidly evolving. With advances in human-level (i.e., the ability to learn from experience, adapt to new situations, handle abstract concepts, and apply knowledge effectively) (Sternberg, 1982) or even superhuman-level (i.e., surpassing the cognitive performance of humans in virtually all domains of interest) (Bostrom, 1998) artificial intelligence, future HATs may take forms that are unprecedented in either human–human teams or existing HAT configurations. As these new forms emerge, additional dimensions may need to be incorporated into the taxonomy to adequately capture their characteristics. In this sense, our review is not intended as a final synthesis, but rather as a starting point that future reviews can build upon as the field progresses.

Finally, extending the fourth point, new testbeds are continuously being developed and studied. This review should be viewed as an archival snapshot based on research records that meet the inclusion criteria and were published up to 2024. Given the growing interest and rapid development in this field, the review needs to be updated in the future to reflect more recent studies. For instance, additional testbeds have been featured in recent work, such as the Scout Exploration Game (Xu et al., 2025) and Mass Evacuation Testbed (Chung et al., 2025; Chung & Yang, 2025b, 2025d).

Conclusion

With advances in machine learning, artificial intelligence, and robotics, agents with human-level intelligence or even superintelligence are no longer just a concept but a real possibility. As the agents are becoming more capable, integrating them effectively with human teams presents not only opportunities but also profound challenges. Consequently, HAT has emerged as a topic of significant research interest. Compared to existing literature review papers, we took a different approach by focusing on the testbeds themselves.

This study conducted a literature review on HAT literature to provide a comprehensive understanding of the available testbeds in this field. Initially, we developed a team classification taxonomy to analyze HATs, adapting an existing scheme used for human-human teams. This scheme was then applied to analyze 103 testbeds utilized in 235 empirical studies. Our study not only identified the distribution of existing HAT testbeds but also highlighted areas that require further investigation. For instance, we found that a significant portion of the literature on HATs focuses on teams consisting of one human and one agent, with humans typically assuming leadership roles. Moreover, the dynamics within these teams tended to remain static over time. Our findings underscore the importance of further research into diverse team attributes, such as team composition, leadership structure, and communication structure, direction, and medium. Such efforts would facilitate a deeper understanding of more complex team dynamics in HATs, potentially leading to more effective collaborations.

Key Points

• We developed a taxonomy for classifying human–agent teams (HATs) by modifying the existing framework for human teams.

• Using the taxonomy, we analyzed 103 HAT testbeds identified from 235 empirical research studies.

• The frequency analysis indicates that existing research studies are centered on certain team types, emphasizing the importance of exploring new forms of HATs.

• Testbeds that allow flexibility in manipulating various team characteristics-related features would greatly enrich research in HATs.

• The proposed taxonomy can serve as a practical framework to guide testbed development, experimental design, and standardized reporting in HAT research.

Supplemental Material

Supplemental Material - A Systematic Review and Taxonomy of Human-Agent Teaming Testbeds

Supplemental Material for A Systematic Review and Taxonomy of Human–Agent Teaming Testbeds by Hyesun Chung, Timothy Holder, Julie A. Shah and X. Jessie Yang in Human Factors.

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Science Foundation under Grant No. 2045009 and the Air Force Office of Scientific Research under grant number FA9550-23-1-0044.

ORCID iDs

Hyesun Chung

X. Jessie Yang

Supplemental Material

Supplemental material for this article is available online.

Author Biographies

Hyesun Chung is a PhD candidate in the Department of Industrial and Operations Engineering at the University of Michigan, Ann Arbor. She received her BS degree in industrial engineering, BFA in design, and BBA in 2020, and MS in industrial engineering in 2022, all from Seoul National University.

Timothy Holder is a postdoctoral fellow in the Department of Aeronautics and Astronautics at the Massachusetts Institute of Technology. He obtained his PhD in biomedical engineering from North Carolina State University and the University of North Carolina at Chapel Hill in 2022.

Julie A. Shah is the H.N. Slater Professor in the Department of Aeronautics and Astronautics at the Massachusetts Institute of Technology. She obtained her PhD in aeronautics and astronautics engineering from MIT in 2011.

X. Jessie Yang is an associate professor in the Department of Industrial and Operations Engineering at the University of Michigan, Ann Arbor. She obtained a PhD in mechanical and aerospace engineering (human factors) from Nanyang Technological University Singapore in 2014.

References

Abuhaimed

Karaoglu

Sen

(2023). Choosing the task allocator: Effect on performance and satisfaction in human-agent team. The International FLAIRS Conference Proceedings, 36(1). https://doi.org/10.32473/flairs.36.133310

Abuhaimed

Sen

(2022a). Evaluating human and agent task allocators in ad hoc human-agent teams. In Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (Vol. 13549, 167–184). https://doi.org/10.1007/978-3-031-20845-4_11

Abuhaimed

Sen

(2022b). Effective task allocation in ad hoc human-agent teams. Frontiers in Artificial Intelligence and Applications, 354, 171–183. https://doi.org/10.3233/FAIA220197

Abuhaimed

Sen

(2023). Influence of expertise complementarity on ad hoc human-agent team effectiveness. In Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (Vol. 13753, 679–688). https://doi.org/10.1007/978-3-031-21203-1_46

Abuhaimed

Sen

(2024). Team performance and user satisfaction in mixed human-agent teams. In Proceedings of the international joint conference on autonomous agents and multiagent systems (pp. 4–12). AAMAS.

Agbeyibor

Ruia

Cortes

C. J.

Kolb

Vela

Coogan

Feigh

(2024a). Run time assurance and human AI fluency in crewed autonomous intelligence surveillance and reconnaissance. In AIAA aviation forum and ASCEND 2024. https://doi.org/10.2514/6.2024-4496

Agbeyibor

Ruia

Kolb

Feigh

K. M.

(2024b). Joint intelligence, surveillance, and reconnaissance mission collaboration with autonomous pilots. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 68(1), 409–415. https://doi.org/10.1177/10711813241262302

Ahmed

de Visser

Shaw

Mohamed-Ameen

Campbell

Parasuraman

(2014). Statistical modelling of networked human-automation performance using working memory capacity. Ergonomics, 57(3), 295–318. https://doi.org/10.1080/00140139.2013.855823

Alarcon

G. M.

Lyons

J. B.

Hamdan

I. A.

Jessup

S. A.

(2024). Affective responses to trust violations in a human-autonomy teaming context: Humans versus robots. International Journal of Social Robotics, 16(1), 23–35. https://doi.org/10.1007/s12369-023-01017-w

10.

Albayram

Jensen

Khan

M. M. H.

Fahim

M. A. A.

Buck

Coman

(2020). Investigating the effects of (empty) promises on human-automation interaction and trust repair. In Proceedings of the 8th international conference on human-agent interaction (pp. 6–14). Association for Computing Machinery.

11.

Al-Hussaini

Guan

Gregory

J. M.

Pollard

Khooshabeh

Gupta

S. K.

(2024). Assessing the impact of alerts on the human supervisor’s decision-making performance in multi-robot missions. ACM Transactions on Human-Robot Interaction, 14(1), 1–40. https://doi.org/10.1145/3689828

12.

Attig

Wollstadt

Schrills

Franke

Wiebel-Herboth

C. B.

(2024). More than task performance: Developing new criteria for successful human-AI teaming using the cooperative card game Hanabi. In Extended abstracts of the chi conference on human factors in computing systems (pp. 1–11). Association for Computing Machinery.

13.

Atweh

J. A.

Hayek

J. A.

Riggs

S. L.

(2023a). Quantifying visual attention of teams during workload transitions using aoi-based cross-recurrence metrics. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 67(1), 1882–1887. https://doi.org/10.1177/21695067231193683

14.

Atweh

J. A.

Hazimeh

Riggs

S. L.

(2023b). Can real-time gaze sharing help team collaboration? A preliminary examination of its effectiveness with pairs. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 67(1), 716–721. https://doi.org/10.1177/21695067231193659

15.

Atweh

J. A.

Riggs

S. L.

(2024a). Scanmatch versus multimatch: A comparison of the sensitivity of Scanpath similarity metrics to changes in workload. In 2024 systems and information engineering design symposium (SIEDS), Charlottesville, VA, USA, 03–03 May 2024, pp. 57–62.

16.

Atweh

J. A.

Riggs

S. L.

(2024b). Gaze sharing, a double-edged sword: Examining the effect of real-time gaze sharing visualizations on team performance and situation awareness. Human Factors, 67(3), 196–224. https://doi.org/10.1177/00187208241272060

17.

Ball

Myers

Heiberg

Cooke

N. J.

Matessa

Freiman

Rodgers

(2010). The synthetic teammate project. Computational & Mathematical Organization Theory, 16(3), 271–299. https://doi.org/10.1007/s10588-010-9065-3

18.

Barber

Davis

Nicholson

Finkelstein

Chen

J. Y.

(2008). The mixed initiative experimental (mix) testbed for human robot interactions with varied levels of automation. In Proceedings of the 26th army science conference (pp. 1–4).

19.

Barg-Walkow

L. H.

Rogers

W. A.

(2016). The effect of incorrect reliability information on expectations, perceptions, and use of automation. Human Factors, 58(2), 242–260. https://doi.org/10.1177/0018720815610271

20.

Bell

B. S.

Kozlowski

S. W.

(2002). A typology of virtual teams: Implications for effective leadership. Group and Organization Management, 27(1), 14–49. https://doi.org/10.1177/1059601102027001003

21.

Bendell

Williams

Fiore

S. M.

Jentsch

(2024). Individual and team profiling to support theory of mind in artificial social intelligence. Scientific Reports, 14(1), 12635. https://doi.org/10.1038/s41598-024-63122-8

22.

Bhat

Lyons

J. B.

Shi

Yang

X. J.

(2022). Clustering trust dynamics in a human-robot sequential decision-making task. IEEE Robotics and Automation Letters, 7(4), 8815–8822. https://doi.org/10.1109/LRA.2022.3188902

23.

Bhat

Lyons

J. B.

Shi

Yang

X. J.

(2024a). Evaluating the impact of personalized value alignment in human-robot interaction: Insights into trust and team performance outcomes. In Proceedings of the 2024 ACM/IEEE international conference on human-robot interaction (pp. 32–41). Association for Computing Machinery.

24.

Bhat

Lyons

J. B.

Shi

Yang

X. J.

(2024b). Value alignment and trust in human-robot interaction: Insights from simulation and user study. In Vinjamuri

(ed), Discovering the frontiers of human-robot interaction (pp. 39–63). Springer. https://doi.org/10.1007/978-3-031-66656-8_3

25.

Bhatti

Demir

Cooke

N. J.

Johnson

C. J.

(2021). Assessing communication and trust in an AI teammate in a dynamic task environment. In 2021 IEEE 2nd international conference on human-machine systems (ICHMS), Magdeburg, Germany, 08–10 September 2021, pp. 1–6.

26.

Bindewald

J. M.

Miller

M. E.

and

G. L. P.

(2020). Creating effective automation to maintain explicit user engagement. International Journal of Human-Computer Interaction, 36(4), 341–354. https://doi.org/10.1080/10447318.2019.1642618

27.

Bishop

Burgess

Ramos

Driggs

J. B.

Williams

Tossell

C. C.

Visser

E. J. d.

(2020). CHAOPT: A testbed for evaluating human-autonomy team collaboration using the video game overcooked!2. In 2020 systems and information engineering design symposium (SIEDS), Charlottesville, VA, USA, 24–24 April 2020, pp. 1–6.

28.

Bobko

Hirshfield

Eloy

Spencer

Doherty

Driscoll

Obolsky

(2023). Human-agent teaming and trust calibration: A theoretical framework, configurable testbed, empirical illustration, and implications for the development of adaptive systems. Theoretical Issues in Ergonomics Science, 24(3), 310–334. https://doi.org/10.1080/1463922X.2022.2086644

29.

Bostrom

(1998). How long before superintelligence. International Journal of Futures Studies, 2(1), 1–9.

30.

Brandt

S. L.

Lachter

Russell

Shively

R. J.

(2018). A human-autonomy teaming approach for a flight-following task. Advances in Intelligent Systems and Computing, 586, 12–22. https://doi.org/10.1007/978-3-319-60642-2_2

31.

Bubb-Lewis

Scerbo

M. W.

(2002). The effects of communication modes on performance and discourse organization with an adaptive interface. Applied Ergonomics, 33(1), 15–26. https://doi.org/10.1016/s0003-6870(01)00046-1

32.

Butchibabu

Sparano-Huiban

Sonenberg

Shah

(2016). Implicit coordination strategies for effective team communication. Human Factors, 58(4), 595–610. https://doi.org/10.1177/0018720816639712

33.

Candon

Hsu

Kim

Chen

Tsoi

Vázquez

(2022). Perceptions of the helpfulness of unexpected agent assistance. In HAI 2022 - proceedings of the 10th conference on human-agent interaction (pp. 41–50). Association for Computing Machinery. https://doi.org/10.1145/3527188.3561915

34.

Capiola

Hamdan

I. A.

Lyons

J. B.

Lewis

Alarcon

G. M.

Sycara

(2024). The effect of asset degradation on trust in swarms: A reexamination of system-wide trust in human-swarm interaction. Human Factors, 66(5), 1475–1489. https://doi.org/10.1177/00187208221145261

35.

Capiola

Johnson

Hamdan

I. A.

Lyons

J. B.

Fox

E. L.

(2023a). Detecting swarm degradation: Measuring human and machine performance. In International conference on human-computer interaction (pp. 325–343). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-35634-6_23

36.

Capiola

Lyons

J. B.

Harris

K. N.

Hamdan

I. A.

Kailas

Sycara

(2023b). Do what you say?” the combined effects of framed social intent and autonomous agent behavior on the trust process. Computers in Human Behavior, 149, 107966. https://doi.org/10.1016/j.chb.2023.107966

37.

Cegarra

Valéry

Avril

Calmettes

Navarro

(2020). OpenMATB: A multi-attribute task battery promoting task customization, software extensibility and experiment replicability. Behavior Research Methods, 52(5), 1980–1990. https://doi.org/10.3758/s13428-020-01364-w

38.

Centeio Jorge

Bouman

N. H.

Jonker

C. M.

Tielman

M. L.

(2023). Exploring the effect of automation failure on the human’s trustworthiness in human-agent teamwork. Frontiers in Robotics and AI, 10, 1143723. https://doi.org/10.3389/frobt.2023.1143723

39.

Chen

J. Y.

Barnes

M. J.

(2010). Supervisory control of robots using roboleader. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 54(19), 1483–1487. https://doi.org/10.1177/154193121005401927

40.

Chen

J. Y.

Barnes

M. J.

(2012). Supervisory control of multiple robots: Effects of imperfect automation and individual differences. Human Factors, 54(2), 157–174. https://doi.org/10.1177/0018720811435843

41.

Chen

J. Y.

Barnes

M. J.

Kenny

(2011a). Effects of imperfect automation on operator’s supervisory control of multiple robots (No. ARLTR5643). https://doi.org/10.21236/ADA552060

42.

Chen

J. Y.

Barnes

M. J.

(2010a). RoboLeader: An agent for supervisory control of multiple robots. In 2010 5th ACM/IEEE international conference on human-robot interaction (HRI), Osaka, Japan, 02–05 March 2010, pp. 81–82. https://doi.org/10.1109/HRI.2010.5453261

43.

Chen

J. Y.

Barnes

M. J.

(2010b). RoboLeader: A surrogate for enhancing the human control of a team of robots: ARLMR-0735. Army Research Laboratory (ARL). https://doi.org/10.21236/ADA514855

44.

Chen

J. Y.

Barnes

M. J.

(2011b). RoboLeader: Dynamic re-tasking for persistence surveillance in an urban environment using robot-to-robot control. Army Research Laboratory. https://doi.org/10.21236/ADA534897

45.

Chen

J. Y.

Barnes

M. J.

Snyder

M. G.

(2010c). Roboleader: An intelligent agent for enhancing supervisory control of multiple robots (Tech. Rep). Technical report ARL-TR-5239. Army Research Laboratory. https://doi.org/10.21236/ADA514855

46.

Chen

J. Y.

Barnes

M. J.

Quinn

S. A.

Plew

(2011c). Effectiveness of RoboLeader for dynamic re-tasking in an urban environment. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 55(1), 1501–1505. https://doi.org/10.1177/1071181311551312

47.

Chen

J. Y.

Drexler

J. M.

Sciarini

L. W.

Cosenzo

K. A.

Barnes

M. J.

Nicholson

(2008). Operator workload and heart-rate variability during a simulated reconnaissance mission with an unmanned ground vehicle. In Proceedings of the 2008 army science conference. in press.

48.

Chen

J. Y.

Quinn

Wright

Barnes

Barber

Adams

(2013). Human-agent teaming for robot management in multitasking environments. In 2013 8th ACM/IEEE international conference on human-robot interaction (HRI), Tokyo, Japan, 03–06 March 2013, pp. 103–104. https://doi.org/10.1109/HRI.2013.6483522

49.

Chen

J. Y. C.

Barnes

M. J.

(2014). Human–agent teaming for multirobot control: A review of human factors issues. IEEE Transactions on Human-Machine Systems, 44(1), 13–29. https://doi.org/10.1109/THMS.2013.2293535

50.

Chien

S.-Y.

Lewis

Sycara

Liu

J.-S.

Kumru

(2016). Influence of cultural factors in dynamic trust in automation. In 2016 IEEE international conference on systems, man, and cybernetics (SMC), Budapest, Hungary, 09–12 October 2016, pp. 002884–002889. https://doi.org/10.1109/SMC.2016.7844677

51.

Chien

S.-Y.

Lewis

Sycara

Liu

J.-S.

Kumru

(2018). The effect of culture on trust in automation: Reliability and workload. ACM Transactions on Interactive Intelligent Systems, 8(4), 1–31. https://doi.org/10.1145/3230736

52.

Chiou

E. K.

Demir

Buchanan

Corral

C. C.

Endsley

M. R.

Lematta

G. J.

Cooke

N. J.

McNeese

N. J.

(2022). Towards human–robot teaming: Tradeoffs of explanation-based communication strategies in a virtual search and rescue task. International Journal of Social Robotics, 14(5), 1117–1136. https://doi.org/10.1007/s12369-021-00834-1

53.

Chiou

E. K.

Lee

J. D.

(2016). Cooperation in human-agent systems to support resilience: A microworld experiment. Human Factors, 58(6), 846–863. https://doi.org/10.1177/0018720816649094

54.

Chung

Holder

Shah

Yang

X. J.

(2024). Developing a team classification scheme for human-agent teaming. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 68(1), 10711813241260387–10711813241261399. https://doi.org/10.1177/10711813241260387

55.

Chung

Jiang

Shen

Yang

X. J.

(2025). Predicting human altruistic and compliance behaviors in multiple-operator single-agent (Mosa) interaction. International Journal of Human-Computer Interaction, 1–19. https://doi.org/10.1080/10447318.2025.2526645

56.

Chung

Yang

X. J.

(2025a). Communication dynamics and team performance in multiple-operator-multiple-agent (moma) team. In 2025 IEEE 5th international conference on human-machine systems (ICHMS) (pp. 1–6). IEEE.

57.

Chung

Yang

X. J.

(2025b). From parts to whole: How trust in AI and humans shape system trust. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (p. 10711813251360006). Sage CA: Los Angeles, CA: SAGE Publications. https://doi.org/10.1177/10711813251360006

58.

Chung

Yang

X. J.

(2025c). Trust in the team as a function of trust in individual agents: Scale validation and modeling. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (p. 10711813251358794). Sage CA: Los Angeles, CA: SAGE Publications. https://doi.org/10.1177/10711813251358794

59.

Chung

Yang

X. J.

(2025d). Understanding multi-referent trust in AI-supported evacuations: The role of transparency and altruism. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (p. 10711813251358779). Sage CA: Los Angeles, CA: SAGE Publications. https://doi.org/10.1177/10711813251358779

60.

Clare

A. S.

Cummings

M. L.

How

J. P.

Whitten

A. K.

Toupet

(2012). Operator objective function guidance for a real-time unmanned vehicle scheduling algorithm. Journal of Aerospace Computing, Information, and Communication, 9(4), 161–173. https://doi.org/10.2514/1.I010019

61.

Cohen

Imada

(2005). Agent-based training of distributed command and control teams. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 49(25), 2164–2168. https://doi.org/10.1177/154193120504902510

62.

Cohen

M. C.

Demir

Chiou

E. K.

Cooke

N. J.

(2021). The dynamics of trust and verbal anthropomorphism in human-autonomy teaming. In 2021 IEEE 2nd international conference on human-machine systems (ICHMS) (pp. 1–6). IEEE.

63.

Comstock

J. R.

Arnegard

R. J.

(1992). The multi-attribute task battery for human operator workload and strategic behavior research. (No. NAS, 1(15), 104174.

64.

Cooke

N. J.

Demir

McNeese

(2016). Synthetic teammates as team players: Coordination of human and synthetic teammates. In Cognitive engineering research institute.

65.

Cummings

Clare

Hart

(2010). The role of human-automation consensus in multiple unmanned vehicle scheduling. Human Factors, 52(1), 17–27. https://doi.org/10.1177/0018720810368674

66.

Cummings

How

J. P.

Whitten

Toupet

(2012). The impact of human–automation collaboration in decentralized multiple unmanned vehicle control. Proceedings of the IEEE, 100(3), 660–671. https://doi.org/10.1109/JPROC.2011.2174104

67.

Dahlbäck

Jönsson

Ahrenberg

(1993). Wizard of Oz studies: Why and how. Proceedings of the 1st International Conference on Intelligent User Interfaces, 6(4), 193–200. https://doi.org/10.1016/0950-7051(93)90017-N

68.

Demir

Amazeen

P. G.

Cookea

N. J.

(2020a). Examining human-autonomy team interaction and explicable behavior in a dynamic LEGO construction task. In KilicayErgin

Dagli

(Eds.), Complex adaptive systems (Vol. 168, pp. 195–201). https://doi.org/10.1016/j.procs.2020.02.270

69.

Demir

Amazeen

P. G.

McNeese

N. J.

Likens

Cooke

N. J.

(2017a). Team coordination dynamics in human-autonomy teaming. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 61(1), 236. https://doi.org/10.1177/1541931213601542

70.

Demir

Canan

Cohen

M. C.

(2023a). Modeling team interaction and decision-making in agile human–machine teams: Quantum and dynamical systems perspective. IEEE Transactions on Human-Machine Systems, 53(4), 720–730. https://doi.org/10.1109/thms.2023.3276744

71.

Demir

Canan

Cohen

M. C.

(2023b). Modeling team interaction and decision-making in agile human-machine teams: Quantum and dynamical systems perspective. IEEE Transactions on Human-Machine Systems, 53(4), 720–730. https://doi.org/10.1109/THMS.2023.3276744

72.

Demir

Cohen

Johnson

C. J.

Chiou

E. K.

Cooke

N. J.

(2023c). Exploration of the impact of interpersonal communication and coordination dynamics on team effectiveness in human-machine teams. International Journal of Human-Computer Interaction, 39(9), 1841–1855. https://doi.org/10.1080/10447318.2022.2143004

73.

Demir

Cooke

N. J.

(2014). Human teaming changes driven by expectations of a synthetic teammate. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 58(1), 16–20. https://doi.org/10.1177/1541931214581004

74.

Demir

Cooke

N. J.

Amazeen

P. G.

(2018a). A conceptual model of team dynamical behaviors and performance in human-autonomy teaming. Cognitive Systems Research, 52, 497–507. https://doi.org/10.1016/j.cogsys.2018.07.029

75.

Demir

Likens

A. D.

Cooke

N. J.

Amazeen

P. G.

McNeese

N. J.

(2019a). Team coordination and effectiveness in human-autonomy teaming. IEEE Transactions on Human-Machine Systems, 49(2), 150–159. https://doi.org/10.1109/THMS.2018.2877482

76.

Demir

McNeese

N. J.

Cooke

N. J.

(2016). Team communication behaviors of the human-automation teaming. In 2016 IEEE international multi-disciplinary conference on cognitive methods in situation awareness and decision support (CogSIMA) (pp. 28–34). IEEE. https://doi.org/10.1109/COGSIMA.2016.7497782

77.

Demir

McNeese

N. J.

Cooke

N. J.

(2017b). Team situation awareness within the context of human-autonomy teaming. Cognitive Systems Research, 46, 3–12. https://doi.org/10.1016/j.cogsys.2016.11.003

78.

Demir

McNeese

N. J.

Cooke

N. J.

(2018b). The impact of perceived autonomous agents on dynamic team behaviors. IEEE Transactions on Emerging Topics in Computational Intelligence, 2(4), 258–267. https://doi.org/10.1109/TETCI.2018.2829985

79.

Demir

McNeese

N. J.

Cooke

N. J.

(2019b). The evolution of human-autonomy teams in remotely piloted aircraft systems operations. Frontiers in Communication, 4, 50. https://doi.org/10.3389/fcomm.2019.00050

80.

Demir

McNeese

N. J.

Cooke

N. J.

(2020b). Understanding human-robot teams in light of all-human teams: Aspects of team interaction and shared cognition. International Journal of Human-Computer Studies, 140, 102436. https://doi.org/10.1016/j.ijhcs.2020.102436

81.

Demir

McNeese

N. J.

Gorman

J. C.

Cooke

N. J.

Myers

C. W.

Grimm

D. A.

(2021). Exploration of teammate trust and interaction dynamics in human-autonomy teaming. IEEE Transactions on Human-Machine Systems, 51(6), 696–705. https://doi.org/10.1109/thms.2021.3115058

82.

Demir

McNeese

N. J.

Johnson

Gorman

J. C.

Grimm

Cooke

N. J.

(2019c). Effective team interaction for adaptive training and situation awareness in human-autonomy teaming. In 2019 IEEE conference on cognitive and computational aspects of situation management (CogSIMA) (pp. 122–126). IEEE. https://doi.org/10.1109/COGSIMA.2019.8724202

83.

de Rooij

Tisza

A. B.

Borst

(2024). Flight-based control allocation: Towards human–autonomy teaming in air traffic control †. Aerospace, 11(11), 919. https://doi.org/10.3390/aerospace11110919

84.

Devine

D. J.

(2002). A review and integration of classification systems relevant to teams in organizations. Group Dynamics: Theory, Research, and Practice, 6(4), 291–310. https://doi.org/10.1037/1089-2699.6.4.291

85.

De Visser

Parasuraman

(2011). Adaptive aiding of human-robot teaming: effects of imperfect automation on performance, trust, and workload. Journal of Cognitive Engineering and Decision Making, 5(2), 209–231. https://doi.org/10.1177/1555343411410160

86.

De Visser

Parasuraman

Freedy

Weltman

(2006). A comprehensive methodology for assessing human-robot team performance for use in training and simulation. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 50(25), 2639–2643. https://doi.org/10.1177/154193120605002507

87.

De Visser

Shaw

Mohamed-Ameen

Parasuraman

(2010). Modeling human-automation team performance in networked systems: Individual differences in working memory count. Proceedings of the Human Factors and Ergonomics Society, 2, 1087–1091. https://doi.org/10.1518/107118110X12829369833529

88.

Dikmen

Farrell

Cao

Burns

(2019). The effects of automation and role allocation on team performance. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 63(1), 235–239. https://doi.org/10.1177/1071181319631501

89.

Dikmen

Farrell

Cao

Burns

(2020). The burden of communication: Effects of automation support and automation transparency on team performance. In 2020 IEEE international conference on systems, man, and cybernetics (SMC) (pp. 2227–2231). IEEE. https://doi.org/10.1109/SMC42975.2020.9282913

90.

Doherty

Spencer

C. A.

Eloy

Kumar

Dickler

Hirshfield

(2023). Using speech patterns to model the dimensions of teamness in human-agent teams. In Proceedings of the 25th international conference on multimodal interaction (pp. 640–648). Association for Computing Machinery.

91.

Duan

Weng

Scalia

M. J.

Zhang

Tuttle

Yin

McNeese

N. J.

(2024a). Getting along with autonomous teammates: Understanding the socio-emotional and teaming aspects of trust in human-autonomy teams. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting ((Vol. 68, No. 1, pp. 1531–1536). Sage CA: Los Angeles, CA: SAGE Publications.

92.

Duan

Zhou

Scalia

M. J.

Yin

Weng

Zhang

Freeman

McNeese

Gorman

Tolston

(2024b). Understanding the evolvement of trust over time within human-ai teams. Proceedings of the ACM on Human-Computer Interaction, 8(CSCW2), 1–31. https://doi.org/10.1145/3687060

93.

Dyer

J. L.

(1984). Team research and team training: A state-of-the-art review. Human factors review, 26, 285–323.

94.

El Iskandarani

Atweh

J. A.

McGarry

S. P. D.

Riggs

S. L.

Moacdieh

N. M.

(2023). Does it MultiMatch? What scanpath comparison tells Us about task performance in teams. Journal of Cognitive Engineering and Decision Making, 17(3), 294–309. https://doi.org/10.1177/15553434231171484

95.

Eloy

Spencer

Doherty

Hirshfield

(2023). Capturing the dynamics of trust and team processes in human-human-agent teams via multidimensional neural recurrence analyses. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW1), 1–23. https://doi.org/10.1145/3579598

96.

Endsley

M. R.

Kaber

D. B.

(1999). Level of automation effects on performance, situation awareness and workload in a dynamic control task. Ergonomics, 42(3), 462–492. https://doi.org/10.1080/001401399185595

97.

Esterwood

Robert

L. P.

(2023). The theory of mind and human–robot trust repair. Scientific Reports, 13(1), 9877. https://doi.org/10.1038/s41598-023-37032-0

98.

Evertsz

Thangarajah

(2020). A framework for engineering human/agent teaming systems. Proceedings of the AAAI Conference on Artificial Intelligence, 34(03), 2477–2484. https://doi.org/10.1609/aaai.v34i03.5629

99.

Faber

Bennewitz

Eppner

Gorog

Gonsior

Joho

Behnke

(2009). The humanoid museum tour guide robotinho. In RO-MAN 2009-the 18th IEEE international symposium on robot and human interactive communication, Toyama, Japan, 27 September 2009–02 October 2009, pp. 891–896. https://doi.org/10.1109/ROMAN.2009.5326326

100.

Fahim

M. A. A.

Khan

M. M. H.

Jensen

Albayram

(2023). Human vs. automation: Which one will you trust more if you are about to lose money? International Journal of Human-Computer Interaction, 39(12), 2420–2435. https://doi.org/10.1080/10447318.2022.2076772

101.

Fahnenstich

Rieger

Roesler

(2024). Trusting under risk – Comparing human to AI decision support agents. Computers in Human Behavior, 153, 108107. https://doi.org/10.1016/j.chb.2023.108107

102.

Fan

McNeese

Sun

Hanratty

Allender

Yen

(2010a). Human-agent collaboration for time-stressed multicontext decision making. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 40(2), 306–320. https://doi.org/10.1109/TSMCA.2009.2035302

103.

Fan

McNeese

Yen

(2010b). NDM-based cognitive agents for supporting decision-making teams. Human-Computer Interaction, 25(3), 195–234. https://doi.org/10.1080/07370020903586720

104.

Fan

McNeese

Yen

Cuevas

Strater

Endsley

M. R.

(2008). The influence of agent reliability on trust in human-agent collaboration. In Proceedings of the 15th European conference on cognitive ergonomics: The ergonomics of cool interaction (pp. 1–8). Association for Computing Machinery. https://doi.org/10.1145/1473018.1473028

105.

Fan

Sun

McNeese

Yen

(2006). RPD-enabled agents teaming with humans for multi-context decision making. In Proceedings of the fifth international joint conference on autonomous agents and multiagent systems (pp. 34–41). Association for Computing Machinery. https://doi.org/10.1145/1160633.1160637

106.

Fan

Sun

McNeese

Yen

(2005). Extending the recognition-primed decision model to support human-agent collaboration. In Proceedings of the fourth international joint conference on autonomous agents and multiagent systems (pp. 945–952). Association for Computing Machinery. https://doi.org/10.1145/1082473.1082616

107.

Fan

Yen

(2011). Modeling cognitive loads for evolving shared mental models in human–agent collaboration. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics: A Publication of the IEEE Systems, Man, and Cybernetics Society, 41(2), 354–367. https://doi.org/10.1109/TSMCB.2010.2053705

108.

Farah

Y. A.

Dorneich

M. C.

(2024). Human-autonomy teaming in a cooperative gamified testbed: How can AI teammates support teamwork processes? In Proceedings of the human factors and ergonomics society annual meeting (Vol. 68, No. 1, pp. 386–392). Sage CA: Los Angeles, CA: SAGE Publications.

109.

Flathmann

Duan

Mcneese

N. J.

Hauptman

Zhang

(2024). Empirically understanding the potential impacts and process of social influence in human-AI teams. Proceedings of the ACM on Human-Computer Interaction, 8(49), 1–49. https://doi.org/10.1145/3637326

110.

Flathmann

Schelble

B. G.

Rosopa

P. J.

McNeese

N. J.

Mallick

Madathil

K. C.

(2023). Examining the impact of varying levels of AI teammate influence on human-AI teams. International Journal of Human-Computer Studies, 177, 103061. https://doi.org/10.1016/j.ijhcs.2023.103061

111.

Freiman

Caisse

Ball

Halverson

Myers

(2018). Empirically identified gaps in a situation awareness model for human-machine coordination. In 2018 IEEE conference on cognitive and computational aspects of situation management (CogSIMA) (pp. 110–116). IEEE. https://doi.org/10.1109/COGSIMA.2018.8423980

112.

Frericks

Kang

Outland

Doshi

Johnsen

Schecter

(2024). Trust and collaboration testing in controlled human-robot environments. In Proceedings - 2024 IEEE 6th international conference on cognitive machine intelligence, CogMI 2024 (pp. 127–136). IEEE. https://doi.org/10.1109/CogMI62246.2024.00026

113.

Frieder

Lin

Kraus

(2021). Agent-human coordination with communication costs under uncertainty. Proceedings of the AAAI Conference on Artificial Intelligence, 26(1), 1557–1563. https://doi.org/10.1609/aaai.v26i1.8329

114.

Gall

Stanton

C. J.

(2024). Low-rank human-like agents are trusted more and blamed less in human-autonomy teaming. Frontiers in Artificial Intelligence, 7, 1273350. https://doi.org/10.3389/frai.2024.1273350

115.

Gao

(2013). Modeling teamwork of multi-human multi-agent teams. In Proceedings of the ACM conference on computer supported cooperative work, CSCW (pp. 47–50). Association for Computing Machinery. https://doi.org/10.1145/2441955.2441969

116.

Gao

Cummings

(2014). Barriers to robust and effective human-agent teamwork. AAAI Spring Symposium - Technical Report, SS-14-04, 36–41.

117.

Gao

Cummings

M. L.

Solovey

(2016). Designing for robust and effective teamwork in humanagent teams. In Robust intelligence and trust in autonomous systems (pp. 167–190). Boston, MA: Springer US.

118.

Gao

Cummings

M. L.

Bertuccelli

L. F.

(2012-03). Teamwork in controlling multiple robots. In 2012 7th ACM/IEEE international conference on human-robot interaction (HRI), Boston, MA, USA, 05–08 March 2012, pp. 81–88.

119.

Gao

Cummings

M. L.

Solovey

E. T.

(2014). Modeling teamwork in supervisory control of multiple robots. IEEE Transactions on Human-Machine Systems, 44(4), 441–453. https://doi.org/10.1109/THMS.2014.2312391

120.

Goodman

Miller

M. E.

Rusnock

C. F.

Bindewald

(2016). Timing within human-agent interaction and its effects on team performance and human behavior. In 2016 IEEE international multi-disciplinary conference on cognitive methods in situation awareness and decision support (CogSIMA) (pp. 35–41). IEEE. https://doi.org/10.1109/COGSIMA.2016.7497783

121.

Gorman

J. C.

Demir

Cooke

N. J.

Grimm

D. A.

(2019). Evaluating sociotechnical dynamics in a simulated remotely-piloted aircraft system: A layered dynamics approach. Ergonomics, 62(5), 629–643. https://doi.org/10.1080/00140139.2018.1557750

122.

Graf

Antoni

C. H.

Müller

Schischke

Ellwart

(2023). Effects of automated communication on team members’ activity and social presence awareness, commitment, and motivation in human-autonomy teams. Computers in Human Behavior, 149, 107925. https://doi.org/10.1016/j.chb.2023.107925

123.

Grimm

Demir

Gorman

J. C.

Cooke

N. J.

(2018a). The complex dynamics of team situation awareness in human-autonomy teaming. In 2018 IEEE conference on cognitive and computational aspects of situation management (CogSIMA) (pp. 103–109). IEEE. https://doi.org/10.1109/COGSIMA.2018.8423990

124.

Grimm

Demir

Gorman

J. C.

Cooke

N. J.

(2018b). Systems level evaluation of resilience in human-autonomy teaming under degraded conditions. In 2018 resilience week (RWS) (pp. 124–130). IEEE. https://doi.org/10.1109/RWEEK.2018.8473561

125.

Grimm

D. A.

Gorman

J. C.

Cooke

N. J.

Demir

McNeese

N. J.

(2023). Dynamical measurement of team resilience. Journal of Cognitive Engineering and Decision Making, 17(4), 351–382. https://doi.org/10.1177/15553434231199729

126.

Guo

Yang

X. J.

Shi

(2023a). Enabling team of teams: A trust inference and propagation (TIP) model in multi-human multi-robot teams. In Robotics: Science and systems XIX.

127.

Guo

Yang

X. J.

Shi

(2023b). TIP: A trust inference and propagation model in multi-human multi-robot teams. In Companion of the 2023 ACM/IEEE international conference on human-robot interaction (pp. 639–643). Association for Computing Machinery. https://doi.org/10.1145/3568294.3580164

128.

Guo

Yang

X. J.

Shi

(2024). TIP: A trust inference and propagation model in multi-human multi-robot teams. Autonomous Robots, 48(7), 1–19. https://doi.org/10.1007/s10514-024-10175-3

129.

Hafızoğlu

F. M.

Sen

(2018). The effects of past experience on trust in repeated human-agent teamwork. In Proceedings of the 17th international conference on autonomous agents and multiagent systems (pp. 514–522). Association for Computing Machinery. https://doi.org/10.5555/3237383.3237460

130.

Hafizoglu

F. M.

Sen

(2018). Reputation based trust in human-agent teamwork without explicit coordination. In Proceedings of the 6th international conference on human-agent interaction (pp. 238–245). Association for Computing Machinery. https://doi.org/10.1145/3284432.3284454

131.

Hafizoğlu

F. M.

Sen

(2019). Understanding the influences of past experience on trust in human-agent teamwork. ACM Transactions on Internet Technology, 19(4), 1–22. https://doi.org/10.1145/3324300

132.

Hafizoglu

F. M.

Sen

(2020). Comparing human trust attitudes towards human and agent teammates. In Proceedings of the 8th international conference on human-agent interaction, HAI 2020 (pp. 50–59). Association for Computing Machinery. https://doi.org/10.1145/3406499.3415082

133.

Han

Qiu

Cheng

Ray

(2024). When teams embrace AI: Human collaboration strategies in generative prompting in a creative design task. In Conference on human factors in computing systems - Proceedings (pp. 1–14). Association for Computing Machinery. https://doi.org/10.1145/3613904.3642133

134.

Hancock

P. A.

Billings

D. R.

Schaefer

K. E.

Chen

J. Y.

De Visser

E. J.

Parasuraman

(2011). A meta-analysis of factors affecting trust in human-robot interaction. Human Factors, 53(5), 517–527. https://doi.org/10.1177/0018720811417254

135.

Hanna

Richards

(2014). The impact of communication on a human-agent shared mental model and team performance. In Proceedings of the 2014 international conference on autonomous agents and multi-agent systems (pp. 1485–1486). Citeseer. https://doi.org/10.5555/2615731.2616024

136.

Hanna

Richards

others . (2015). The impact of virtual agent personality on a shared mental model with humans during collaboration. In Aamas (pp. 1777–1778). Association for Computing Machinery. https://doi.org/10.5555/2772879.2773432

137.

Hanna

Richards

(2018). The impact of multimodal communication on a shared mental model, trust, and commitment in human–intelligent virtual agent teams. Multimodal Technologies and Interaction, 2(3), 48. https://doi.org/10.3390/mti2030048

138.

Hanna

Richards

(2019). Speech act theory as an evaluation tool for human-agent communication. Algorithms, 12(4), 79. https://doi.org/10.3390/A12040079

139.

Hanna

Richards

Hitchens

(2013). Evaluating the impact of the human-agent teamwork communication model (HAT-CoM) on the development of a shared mental model. Lecture Notes in Computer Science, 8291, 453–460. https://doi.org/10.1007/978-3-642-44927-7_34

140.

Harbers

Bradshaw

J. M.

Johnson

Feltovich

van den Bosch

Meyer

J.-J.

(2011a). Explanation and coordination in human-agent teams: A study in the BW4t testbed. In 2011 IEEE/WIC/ACM international conferences on web intelligence and intelligent agent technology (pp. 17–20). IEEE. https://doi.org/10.1109/WI-IAT.2011.83

141.

Harbers

Bradshaw

J. M.

Johnson

Feltovich

Van Den Bosch

Meyer

J.-J.

(2011b). Explanation in human-agent teamwork. In International workshop on coordination, organizations, institutions, and norms in agent systems (pp. 21–37). Springer. https://doi.org/10.1007/978-3-642-35545-5_2

142.

Harriott

C. E.

Zhuang

Adams

J. A.

DeLoach

S. A.

(2012). Towards using human performance moderator functions in human-robot teams. In First International Workshop on Human-Agent Interaction Design and Models.

143.

Harris

Raviv

(2002). Organization design. Management Science, 48(7), 852–865. https://doi.org/10.1287/mnsc.48.7.852.2821

144.

Hauptman

A. I.

Schelble

B. G.

Duan

Flathmann

McNeese

N. J.

(2024). Understanding the influence of AI autonomy on AI explainability levels in human-AI teams using a mixed methods approach. Cognition, Technology & Work, 26(3), 435–455. https://doi.org/10.1007/s10111-024-00765-7

145.

Hillesheim

A. J.

Rusnock

C. F.

(2016). Predicting the effects of automation reliability rates on human-automation team performance. In 2016 winter simulation conference (WSC) (pp. 1802–1813). IEEE. https://doi.org/10.1109/WSC.2016.7822227

146.

Hoffman

Breazeal

(2007). Effects of anticipatory action on human-robot teamwork efficiency, fluency, and perception of team. In Proceedings of the ACM/IEEE international conference on human-robot interaction (pp. 1–8). Association for Computing Machinery. https://doi.org/10.1145/1228716.1228718

147.

Hollenbeck

J. R.

Beersma

Schouten

M. E.

(2012). Beyond team types and taxonomies: A dimensional scaling conceptualization for team description. Academy of Management Review, 37(1), 82–106. https://doi.org/10.5465/amr.2010.0181

148.

Johnson

C. J.

Demir

McNeese

N. J.

Gorman

J. C.

Wolff

A. T.

Cooke

N. J.

(2023). The impact of training on human–autonomy team communications and trust calibration. Human Factors, 65(7), 1554–1570. https://doi.org/10.1177/00187208211047323

149.

Johnson

C. J.

Demir

Zabala

G. M.

Grimm

D. A.

Radigan

Gorman

J. C.

(2020). Training and verbal communications in human-autonomy teaming under degraded conditions. In 2020 IEEE conference on cognitive and computational aspects of situation management (CogSIMA) (pp. 53–58). IEEE.

150.

Johnson

Bradshaw

J. M.

Feltovich

P. J.

Jonker

van Riemsdijk

Sierhuis

(2012). Autonomy and interdependence in human-agent-robot teams. IEEE Intelligent Systems, 27(2), 43–51. https://doi.org/10.1109/MIS.2012.1

151.

Johnson

Jonker

Van Riemsdijk

Feltovich

P. J.

Bradshaw

J. M.

(2009). Joint activity testbed: Blocks world for teams (bw4t). In Engineering societies in the agents world x: 10th international workshop, ESAW 2009, Utrecht, The Netherlands, November 18-20, 2009. Proceedings (pp. 254–256). Springer.

152.

Jorge

C. C.

Jonker

C. M.

Tielman

M. L.

(2024). How should an AI trust its human teammates? Exploring possible cues of artificial trust. ACM Transactions on Interactive Intelligent Systems, 14(1), 1–5. https://doi.org/10.1145/3635475

153.

Kaelin

V. C.

Tewari

Benouar

Lindgren

(2024). Developing teamwork: Transitioning between stages in human-agent collaboration. Frontiers of Computer Science, 6, 1455903. https://doi.org/10.3389/fcomp.2024.1455903

154.

Karwowski

Salvendy

Endsley

Rouse

Salmon

Stanney

Stanton

(2025). Grand challenges for human factors and ergonomics. Theoretical Issues in Ergonomics Science, 26(4), 361–456.

155.

Keyton

Beck

S. J.

(2008). Team attributes, processes, and values: A pedagogical framework. Business Communication Quarterly, 71(4), 488–504. https://doi.org/10.1177/1080569908325863

156.

Khavas

Z. R.

Kotturu

M. R.

Azadeh

Robinette

(2024). Do humans have different expectations regarding humans and robots’ morality? In 2024 33rd IEEE international conference on robot and human interactive communication (ROMAN) (pp. 1126–1133). IEEE.

157.

Khavas

Z. R.

Robinette

(2024). Human-robot interaction experiment: Minor changes; significant differences. In Proceedings of the second international symposium on trustworthy autonomous systems (pp. 1–13). Association for Computing Machinery.

158.

Klein

Wiggins

Dominguez

C. O.

(2010). Team sensemaking. Theoretical Issues in Ergonomics Science, 11(4), 304–320. https://doi.org/10.1080/14639221003729177

159.

Kox

E. S.

Kerstholt

J. H.

Hueting

T. F.

de Vries

P. W.

(2021). Trust repair in human-agent teams: The effectiveness of explanations and expressing regret. Autonomous Agents and Multi-Agent Systems, 35(2), 30. https://doi.org/10.1007/s10458-021-09515-9

160.

Kox

E. S.

van den Boogaard

Turjaka

Kerstholt

J. H.

(2024). The journey or the destination: The impact of transparency and goal attainment on trust in human-robot teams. ACM Transactions on Human-Robot Interaction, 14(2), 1–23. https://doi.org/10.1145/3702245

161.

Kulms

Kopp

(2019). More human-likeness, more trust? The effect of anthropomorphism on self-reported and behavioral trust in continued and interdependent human–agent cooperation. In ACM international conference proceeding series (pp. 31–42). Association for Computing Machinery. https://doi.org/10.1145/3340764.3340793

162.

Lakhmani

S. G.

Wright

J. L.

Schwartz

Barber

(2020). Exploring the effect of communication patterns and transparency on the attitudes towards robots. Advances in Intelligent Systems and Computing, 958, 27–36. https://doi.org/10.1007/978-3-030-20148-7_3

163.

Lakhmani

S. G.

Wright

J. L.

Schwartz

M. R.

Barber

(2019). Exploring the effect of communication patterns and transparency on performance in a human-robot team. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 63(1), 160–164. https://doi.org/10.1177/1071181319631054

164.

Large

Stodolski

Vázquez

(2020). Studying human-agent interactions in space invaders. In HAI 2020 - proceedings of the 8th international conference on human-agent interaction (pp. 245–247). Association for Computing Machinery. https://doi.org/10.1145/3406499.3418747

165.

Lavender

Abuhaimed

Sen

(2023). Relative effects of positive and negative explanations on satisfaction and performance in human-agent teams. Proceedings of the International Florida Artificial Intelligence Research Society Conference, FLAIRS, 36. https://doi.org/10.32473/flairs.36.133371

166.

Lavender

Abuhaimed

Sen

(2024). Effects of explanation types on user satisfaction and performance in human-agent teams. The International Journal on Artificial Intelligence Tools, 33(3), 2460004. https://doi.org/10.1142/S0218213024600042

167.

Le Guillou

Prévot

Berberian

(2023). Trusting artificial agents: Communication trumps performance. In AAMAS 2023 (pp. 299–306). Association for Computing Machinery.

168.

Lenor

Lewis

Hahn

Payne

Sycarn

(2000). Task characteristics and intelligent aiding [route-planning tasks]. In SMC 2000 conference proceedings. 2000 IEEE international conference on systems, man and cybernetics. ‘Cybernetics evolving to systems, humans, organizations, and their complex interactions’ (cat no. 00ch37166) (Vol. 2, pp. 1123–1127). IEEE. https://doi.org/10.1109/ICSMC.2000.886002

169.

Lenox

Hahn

Lewis

Roth

(1999). Improving performance: Should we support individuals or teams? Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 43(3), 223–227. https://doi.org/10.1177/154193129904300320

170.

Lenox

Lewis

Roth

Shern

Roberts

Rafalski

Jacobson

(1998). Support of teamwork in human-agent teams. In 1998 IEEE international conference on systems, man, and cybernetics (Vol. 1-5, pp. 1341–1346). IEEE.

171.

Lenox

Roberts

Lewis

(1997). Human-agent interaction in a target identification task. In 1997 IEEE international conference on systems, man, and cybernetics. Computational cybernetics and simulation (Vol. 3, pp. 2702–2706). IEEE. https://doi.org/10.1109/ICSMC.1997.635346

172.

León

G. A.

Chiou

E. K.

and

A. W.

(2021). Accountability increases resource sharing: Effects of accountability on human and AI system performance. International Journal of Human-Computer Interaction, 37(5), 434–444. https://doi.org/10.1080/10447318.2020.1824695

173.

Lewis

Sycara

Payne

(2003). Agent roles in human teams.

174.

Agrawal

Jia

Raja

Gui

Hughes

Lewis

Sycara

(2021). Individualized mutual adaptation in human-agent teams. IEEE Transactions on Human-Machine Systems, 51(6), 706–714. https://doi.org/10.1109/thms.2021.3107675

175.

Dong

Chiou

E. K.

(2020). Reciprocity and its neurological correlates in human-agent cooperation. IEEE Transactions on Human-Machine Systems, 50(5), 384–394. https://doi.org/10.1109/thms.2020.2992224

176.

Sun

Miller

(2016). Communication in human-agent teams for tasks with joint action. Lecture Notes in Computer Science, 9628, 224–241. https://doi.org/10.1007/978-3-319-42691-4_13

177.

Zhang

Sun

Zhang

Wen

Wang

Pan

(2024). Tackling cooperative incompatibility for zero-shot human-AI coordination. Journal of Artificial Intelligence Research, 80, 1139–1185. https://doi.org/10.1613/jair.1.15884

178.

Lin

Panganiban

A. R.

Matthews

Gibbins

Ankeney

See

Bailey

Long

(2022). Trust in the danger zone: Individual differences in confidence in robot threat assessments. Frontiers in Psychology, 13, 601523. https://doi.org/10.3389/fpsyg.2022.601523

179.

Liu

Hamrick

J. B.

Fisac

J. F.

Dragan

A. D.

Hedrick

J. K.

Sastry

S. S.

Griffiths

T. L.

(2016). In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems (pp. 940–948).

180.

Liu

K.-Y.

Volonte

Hsu

Y.-C.

Babu

S. V.

Wong

S.-K.

(2019). Interaction with proactive and reactive agents in box manipulation tasks in virtual environments. Computer Animations and Virtual Worlds, 30(3), Article e1881. https://doi.org/10.1002/cav.1881

181.

Long

R. J.

Ariyo

A. G.

Iwu

Z. I.

Azari

D. P.

Madison

A. M.

Aroca-Ouellette

Ries

A. J.

(2024). Towards an adaptive system for investigating human-agent teaming: Let’s get cooking. In 2024 Systems and information engineering design symposium (SIEDS), Charlottesville, VA, USA, 03–03 May 2024, pp. 1–6.

182.

Lyons

J. B.

Mator

J. D.

Orr

Alarcon

G. M.

Barrera

(2024). Is the pull-down effect overstated? An examination of trust propagation among fighter pilots in a high-fidelity simulation. Journal of Cognitive Engineering and Decision Making, 18(2), 99–113. https://doi.org/10.1177/15553434231225909

183.

Lyons

J. B.

Sycara

Lewis

Capiola

(2021). Human–autonomy teaming: Definitions, debates, and directions. Frontiers in Psychology, 12, 589585. https://doi.org/10.3389/fpsyg.2021.589585

184.

Mallick

Flathmann

Duan

Schelble

B. G.

McNeese

N. J.

(2024a). What you say vs what you do: Utilizing positive emotional expressions to relay AI teammate intent within human–AI teams. International Journal of Human-Computer Studies, 192, 103355. https://doi.org/10.1016/j.ijhcs.2024.103355

185.

Mallick

Flathmann

Lancaster

Hauptman

McNeese

Freeman

(2024b). The pursuit of happiness: The power and influence of AI teammate emotion in human-AI teamwork. Behaviour & Information Technology, 43(14), 3436–3460. https://doi.org/10.1080/0144929X.2023.2277909

186.

Matthews

Cumings

Casey

Panganiban

A. R.

Chella

Pipitone

Mouloua

(2024). Compromise in human-robot collaboration for threat assessment. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 68(1), 329–335. https://doi.org/10.1177/10711813241276449

187.

McChrystal

G. S.

Collins

Silverman

Fussell

(2015). Team of teams: New rules of engagement for a complex world: Penguin Publishing Group.

188.

McKendrick

Shaw

De Visser

Saqer

Kidwell

Parasuraman

(2014). Team performance in networked supervisory control of unmanned air vehicles: Effects of automation, working memory, and communication content. Human Factors, 56(3), 463–475. https://doi.org/10.1177/0018720813496269

189.

McKendrick

Shaw

Saqer

De Visser

Parasuraman

(2011). Team performance and communication within networked supervisory control human-machine systems. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 55(1), 262–266. https://doi.org/10.1177/1071181311551054

190.

McNeese

N. J.

Demir

Chiou

Cooke

Yanikian

(2019). Understanding the role of trust in human-autonomy teaming. In 52nd Annual Hawaii International Conference on System Sciences, HICSS 2019 (pp. 254–263). IEEE Computer Society.

191.

McNeese

N. J.

Demir

Chiou

E. K.

Cooke

N. J.

(2021a). Trust and team performance in human–autonomy teaming. International Journal of Electronic Commerce, 25(1), 51–72. https://doi.org/10.1080/10864415.2021.1846854

192.

McNeese

N. J.

Demir

Cooke

N. J.

Myers

(2018). Teaming with a synthetic teammate: Insights into human-autonomy teaming. Human Factors, 60(2), 262–273. https://doi.org/10.1177/0018720817743223

193.

McNeese

N. J.

Demir

Cooke

N. J.

She

(2021b). Team situation awareness and conflict: A study of human–machine teaming. Journal of Cognitive Engineering and Decision Making, 15(2-3), 83–96. https://doi.org/10.1177/15553434211017354

194.

McNeese

N. J.

Schelble

B. G.

Canonico

L. B.

Demir

(2021c). Who/What is my teammate? Team composition considerations in human–AI teaming. IEEE Transactions on Human-Machine Systems, 51(4), 288–299. https://doi.org/10.1109/thms.2021.3086018

195.

Meimandi

K. J.

Bolton

M. L.

Beling

P. A.

(2024). Action over words: Predicting human trust in AI partners through gameplay behaviors. In 2024 33rd IEEE international conference on robot and human interactive communication (ROMAN), Pasadena, CA, USA, 26–30 August 2024, pp. 563–568.

196.

Memar

A. H.

Esfahani

E. T.

(2018). Physiological measures for human performance analysis in human-robot teamwork: Case of tele-exploration. IEEE Access, 6, 3694–3705. https://doi.org/10.1109/ACCESS.2018.2790838

197.

Mercado

J. E.

Rupp

M. A.

Chen

J. Y.

Barnes

M. J.

Barber

Procci

(2016). Intelligent agent transparency in human–agent teaming for multi-UxV management. Human Factors, 58(3), 401–415. https://doi.org/10.1177/0018720815621206

198.

Moher

Liberati

Tetzlaff

Altman

D. G.

PRISMA Group (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. PLoS Medicine, 6(7), Article e1000097. https://doi.org/10.1371/journal.pmed.1000097

199.

Momose

Mehta

Moukpe

Weekes

T. R.

Eskridge

T. C.

(2024). Human-AI teamwork interface design using patterns of interactions. International Journal of Human-Computer Interaction, 41(11), 7112–7134. https://doi.org/10.1080/10447318.2024.2389350

200.

Momose

Weekes

Mehta

Wright

Moukpe

Eskridge

(2023). Patterns of effective human-agent teams. In Extended abstracts of the 2023 CHI conference on human factors in computing systems (pp. 1–13). Association for Computing Machinery.

201.

Mullins

Necaise

Fiore

S. M.

Amon

M. J.

(2024). Navigating trust: The interplay of trust in automation and team communication in an extended simulated military mission. In Proceedings of the human factors and ergonomics society annual meeting (Vol. 68, No. 1, pp. 305–309). Sage CA: Los Angeles, CA: SAGE Publications.

202.

Musick

O’Neill

T. A.

Schelble

B. G.

McNeese

N. J.

Henke

J. B.

(2021). What happens when humans believe their teammate is an AI? An investigation into humans teaming with autonomy. Computers in Human Behavior, 122, 106852. https://doi.org/10.1016/j.chb.2021.106852

203.

Myers

Ball

Cooke

Freiman

Caisse

Rodgers

Demir

McNeese

(2019). Autonomous intelligent agents for team training. IEEE Intelligent Systems, 34(2), 3–14. https://doi.org/10.1109/MIS.2018.2886670

204.

Nagi

Giusti

Gambardella

L. M.

Di Caro

G. A.

(2014). Human-swarm interaction using spatial gestures. In 2014 IEEE/RSJ international conference on intelligent robots and systems (pp. 3834–3841). IEEE. https://doi.org/10.1109/IROS.2014.6943101

205.

Narayanan

Zhang

Mendoza

Kambhampati

(2015). Automated planning for peer-to-peer teaming and its evaluation in remote human-robot interaction. In Proceedings of the tenth annual ACM/IEEE international conference on human-robot interaction extended abstracts (pp. 161–162). Association for Computing Machinery. https://doi.org/10.1145/2701973.2702042

206.

Natarajan

Xue

van Waveren

Feigh

Gombolay

(2024). Mixed-initiative human-robot teaming under suboptimality with online Bayesian adaptation. In Proceedings of the international joint conference on autonomous agents and multiagent systems, AAMAS (pp. 1454–1462). Association for Computing Machinery.

207.

Nguyen

Carmody

Wildman

Carroll

(2022). Developing a dynamic testbed: Designing for trust in complex human agent teams. In 2022 IEEE 3rd international conference on human-machine systems (ICHMS), Orlando, FL, USA, 17–19 November 2022, pp. 1–4.

208.

O’Neill

McNeese

Barron

Schelble

(2022). Human–autonomy teaming: A review and analysis of the empirical literature. Human Factors, 64(5), 904–938. https://doi.org/10.1177/0018720820960865

209.

Paleja

Ghuy

Ranawaka Arachchige

Jensen

Gombolay

(2021). The utility of explainable ai in ad hoc human-machine teaming. Advances in neural information processing systems, 34, 610–623.

210.

Paleja

Munje

Chang

K. C.

Jensen

Gombolay

(2024). Designs for enabling collaboration in human-machine teaming via interactive and explainable systems. In Advances in neural information processing systems (p. 28). Association for Computing Machinery.

211.

Panganiban

A. R.

Matthews

Long

M. D.

(2020). Transparency in autonomous teammates: Intention to support as teaming information. Journal of Cognitive Engineering and Decision Making, 14(2), 174–190. https://doi.org/10.1177/1555343419881563

212.

Parasuraman

Sheridan

T. B.

Wickens

C. D.

(2000). A model for types and levels of human interaction with automation. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans: A Publication of the IEEE Systems, Man, and Cybernetics Society, 30(3), 286–297. https://doi.org/10.1109/3468.844354

213.

Payne

Lenox

Hahn

Lewis

Sycara

(2000). Agent-based team aiding in a time critical task. In Proceedings of the 33rd annual Hawaii international conference on system sciences, Maui, HI, USA, 07–07 January 2000, p. 9. https://doi.org/10.1109/HICSS.2000.926632

214.

Prada

Paiva

(2009). Teaming up humans with autonomous synthetic characters. Artificial Intelligence, 173(1), 80–103. https://doi.org/10.1016/j.artint.2008.08.006

215.

Pugh

D. S.

Hickson

D. J.

Hinings

C. R.

(1969). An empirical taxonomy of structures of work organizations. Administrative Science Quarterly, 14(1), 115–126. https://doi.org/10.2307/2391367

216.

Qian

Orlov Savko

Neubauer

Gremillion

Unhelkar

(2024). Measuring variations in workload during human-robot collaboration through automated after-action reviews. In Companion of the 2024 ACM/IEEE international conference on human-robot interaction (pp. 852–856). Association for Computing Machinery.

217.

Rebensky

Carmody

Ficke

Carroll

Bennett

(2022). Teammates instead of tools: The impacts of level of autonomy on mission performance and human–agent teaming dynamics in multi-agent distributed teams. Frontiers in Robotics and AI, 9, 782134. https://doi.org/10.3389/frobt.2022.782134

218.

Rezaei Khavas

Kotturu

M. R.

Ahmadzadeh

S. R.

Robinette

(2024). Do humans trust robots that violate moral trust? ACM Transactions on Human-Robot Interaction, 13(2), 1–30. https://doi.org/10.1145/3651992

219.

Rossi

Staffa

Rossi

(2017). Supervisory control of multiple robots through group communication. IEEE Transactions on Cognitive and Developmental Systems, 9(1), 56–67. https://doi.org/10.1109/TCDS.2016.2606562

220.

Ruff

H. A.

Narayanan

Draper

M. H.

(2002). Human interaction with levels of automation and decision-aid fidelity in the supervisory control of multiple simulated unmanned air vehicles. Presence: Teleoperators and Virtual Environments, 11(4), 335–351. https://doi.org/10.1162/105474602760204264

221.

Saavedra

Earley

P. C.

Van Dyne

(1993). Complex interdependence in task-performing groups. Journal of Applied Psychology, 78(1), 61–72. https://doi.org/10.1037/0021-9010.78.1.61

222.

Salas

Dickinson

T. L.

Converse

S. A.

Tannenbaum

S. I.

(1992). Toward an understanding of team performance and training. In Swezey

R. W.

Salas

(Eds.), Teams: Their training and performance (pp. 3–29). Ablex Publishing.

223.

Salas

Sims

D. E.

Burke

C. S.

(2005). Is there a “big five” in teamwork? Small Group Research, 36(5), 555–599. https://doi.org/10.1177/1046496405277134

224.

Salikutluk

Schöpper

Herbert

Scheuermann

Frodl

Balfanz

Koert

(2024). An evaluation of situational autonomy for human-AI collaboration in a shared workspace setting. In Conference on human factors in computing systems - proceedings (pp. 1–17). Association for Computing Machinery. https://doi.org/10.1145/3613904.3642564

225.

Scalia

M. J.

Harrison

J. L.

Zhou

Grimm

D. A.

Gorman

J. C.

(2022a). Interaction with an autonomous team member determines the relationship between team trust and team performance. In 2022 IEEE 3rd international conference on human-machine systems (ICHMS), Orlando, FL, USA, 17–19 November 2022, pp. 1–4.

226.

Scalia

M. J.

Zhou

Grimm

D. A. P.

Harrison

J. L.

Gorman

J. C.

(2022b). The role of timing of information front-loading and planning ahead in all-human vs. human-autonomy team performance. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 66(1), 530–534. https://doi.org/10.1177/1071181322661251

227.

Schadd

M. P. D.

Schoonderwoerd

T. A. J.

van den Bosch

Visker

O. H.

Haije

Veltman

K. H. J.

(2022). “I’m afraid I can’t do that, Dave”; getting to know your buddies in a human–agent team. Systems, 10(1), 15. https://doi.org/10.3390/systems10010015

228.

Schelble

B. G.

Flathmann

McNeese

N. J.

Freeman

Mallick

(2022a). Let’s think together! assessing shared mental models, performance, and trust in human-agent teams. Proceedings of the ACM on Human-Computer Interaction, 6(13), 1–13. https://doi.org/10.1145/3492832

229.

Schelble

B. G.

Flathmann

McNeese

N. J.

O’Neill

Pak

Namara

(2023a). Investigating the effects of perceived teammate artificiality on human performance and cognition. International Journal of Human-Computer Interaction, 39(13), 2686–2701. https://doi.org/10.1080/10447318.2022.2085191

230.

Schelble

B. G.

Flathmann

Musick

McNeese

N. J.

Freeman

(2022b). I see you: Examining the role of spatial information in human-agent teams. Proceedings of the ACM on Human-Computer Interaction, 6(CSCW2), 1–27. https://doi.org/10.1145/3555099

231.

Schelble

B. G.

Lancaster

Duan

Mallick

McNeese

N. J.

Lopez

(2023b). The effect of AI teammate ethicality on trust outcomes and individual performance in human-AI teams. In HICSS (pp. 322–331).

232.

Schilling

M. A.

(2000). Toward a general modular systems theory and its application to interfirm product modularity. Academy of Management Review, 25(2), 312–334. https://doi.org/10.2307/259016

233.

Schneider

M. F.

Miller

M. E.

McGuirl

(2022). Assessing quality goal rankings as a method for communicating operator intent. Journal of Cognitive Engineering and Decision Making, 17(1), 26–48. https://doi.org/10.1177/15553434221131665

234.

Schneiders

Cheon

Kjeldskov

Rehm

Skov

M. B.

(2022). Non-dyadic interaction: A literature review of 15 years of human-robot interaction conference publications. ACM Transactions on Human-Robot Interaction, 11(2), 13:1–13. https://doi.org/10.1145/3488242

235.

Schoonderwoerd

T. A.

Zoelen

E. M. v.

Bosch

K. v. d.

Neerincx

M. A.

(2022). Design patterns for human-AI co-learning: A wizard-of-oz evaluation in an urban-search-and-rescue task. International Journal of Human-Computer Studies, 164, 102831. https://doi.org/10.1016/j.ijhcs.2022.102831

236.

Schurr

Marecki

Tambe

Scerri

(2005). Towards flexible coordination of human-agent teams. Multiagent and Grid Systems, 1(1), 3–16. https://doi.org/10.3233/MGS-2005-1102

237.

Sheridan

T. B.

Verplank

W. L.

(1978). Human and computer control of undersea teleoperators. Man-Machine Systems Laboratory Report. Cambridge, MA: MIT.

238.

Sidji

Smith

Rogerson

M. J.

(2023). The hidden rules of hanabi: How humans outperform AI agents. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (pp. 1–16).

239.

Simon

Guérin

Rauffet

Chauvin

Martin

É.

(2023). How humans comply with a (potentially) faulty robot: Effects of multidimensional transparency. IEEE Transactions on Human-Machine Systems, 53(4), 751–760. https://doi.org/10.1109/thms.2023.3273773

240.

Siu

H. C.

Peña

J. D.

Zhou

Chen

Lopez

V. J.

Palko

Allen

R. E.

(2021). Evaluation of human-ai teams for learned and rule-based agents in hanabi. Advances in Neural Information Processing Systems, 34, 16183–16195.

241.

Song

Soria Zurita

N. F.

Gyory

J. T.

Zhang

McComb

Cagan

Miller

Balon

McComb

Yukish

(2022). Decoding the agility of artificial intelligence-assisted human design teams. Design Studies, 79, 101094. https://doi.org/10.1016/j.destud.2022.101094

242.

Sternberg

R. J.

(1982). Handbook of human intelligence. Cambridge University Press.

243.

Strybel

T. Z.

Keeler

Mattoon

Alvarez

Barakezyan

Barraza

Park

K. P. L.

Battiste

(2018). Measuring the effectiveness of human autonomy teaming. Advances in Intelligent Systems and Computing, 586, 23–33. https://doi.org/10.1007/978-3-319-60642-2_3

244.

Sycara

(2002). Integrating agents into human teams. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 46(3), 413–417. https://doi.org/10.1177/154193120204600342

245.

Sycara

Lewis

Lenox

Roberts

(1998). Calibrating trust to integrate intelligent agents into human teams. In Proceedings of the thirty-first Hawaii international conference on system sciences, Kohala Coast, HI, USA, 09–09 January 1998, pp. 263–268. https://doi.org/10.1109/HICSS.1998.653107

246.

Thylefors

Persson

Hellström

(2005). Team types, perceived efficiency and team climate in Swedish cross-professional teamwork. Journal of Interprofessional Care, 19(2), 102–114. https://doi.org/10.1080/13561820400024159

247.

Tian

Tomizuka

Dragan

A. D.

Bajcsy

(2023). Towards modeling and influencing the dynamics of human learning. In Proceedings of the 2023 ACM/IEEE international conference on human-robot interaction (pp. 350–358). Association for Computing Machinery. https://doi.org/10.1145/3568162.3578629

248.

Tiferes

Bisantz

A. M.

(2018). The impact of team characteristics and context on team communication: An integrative literature review. Applied Ergonomics, 68, 146–159. https://doi.org/10.1016/j.apergo.2017.10.020

249.

Tokadlı

Dorneich

M. C.

(2022). Autonomy as a teammate: Evaluation of teammate-likeness. Journal of Cognitive Engineering and Decision Making, 16(4), 282–300. https://doi.org/10.1177/15553434221108002

250.

Tokadlı

Dorneich

M. C.

Matessa

(2021). Evaluation of playbook delegation approach in human-autonomy teaming for single pilot operations. International Journal of Human-Computer Interaction, 37(7), 703–716. https://doi.org/10.1080/10447318.2021.1890485

251.

Tulli

Correia

Mascarenhas

Gomes

Melo

F. S.

Paiva

(2019). Effects of agents’ transparency on teamwork. In Calvaresi

Najjar

Schumacher

Framling

(Eds.), Explainable, transparent autonomous agents and multi-agent systems, extraamas 2019 (Vol. 11763, pp. 22–37). Springer. https://doi.org/10.1007/978-3-030-30391-4_2

252.

Ulusan

Narayan

Snodgrass

Ergun

Harteveld

(2022). “Rather solve the problem from scratch”: Gamesploring human-machine collaboration for optimizing the debris collection problem. In International conference on intelligent user interfaces, proceedings IUI (pp. 604–619). Association for Computing Machinery. https://doi.org/10.1145/3490099.3511163

253.

van den Bosch

van Zoelen

E. M.

Schoonderwoerd

T. A. J.

Solaki

van der Stigchel

Akrum

(2024). Design and effects of co-learning in human-AI teams. Journal of Artificial Inligence Research, 82, 1445–1493. https://doi.org/10.1613/jair.1.16846

254.

van Wissen

Gal

Kamphorst

Dignum

(2010). Human-agent team formation: An empirical study. In Proc. 22rd Benelux Conf. Artif. Intell. (pp. 1–8).

255.

van Wissen

Gal

Kamphorst

Dignum

(2012). Human-agent teamwork in dynamic environments. Computers in Human Behavior, 28(1), 23–33. https://doi.org/10.1016/j.chb.2011.08.006

256.

Vered

Howe

Miller

Sonenberg

Velloso

(2020). Demand-driven transparency for monitoring intelligent agents. IEEE Transactions on Human-Machine Systems, 50(3), 264–275. https://doi.org/10.1109/THMS.2020.2988859

257.

Verhagen

R. S.

Marcu

Neerincx

M. A.

Tielman

M. L.

(2024). The influence of interdependence on trust calibration in human-machine teams. Frontiers in Artificial Intelligence and Applications, 386, 300–314. https://doi.org/10.3233/FAIA240203

258.

Verhagen

R. S.

Neerincx

M. A.

Parlar

Vogel

Tielman

M. L.

(2023). Personalized agent explanations for human-agent teamwork: Adapting explanations to user trust, workload, and performance. In Proceedings of the international joint conference on autonomous agents and multiagent systems, AAMAS (pp. 2316–2318). Association for Computing Machinery.

259.

Walliser

J. C.

de Visser

E. J.

Shaw

T. H.

(2023). Exploring system wide trust prevalence and mitigation strategies with multiple autonomous agents. Computers in Human Behavior, 143, 107671. https://doi.org/10.1016/j.chb.2023.107671

260.

Walliser

J. C.

de Visser

E. J.

Wiese

Shaw

T. H.

(2019). Team structure and team building improve human–machine teaming with autonomous agents. Journal of Cognitive Engineering and Decision Making, 13(4), 258–278. https://doi.org/10.1177/1555343419867563

261.

Walliser

J. C.

Mead

P. R.

Shaw

T. H.

(2017). The perception of teamwork with an autonomous agent enhances affect and performance outcomes. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 61(1), 231–235. https://doi.org/10.1177/1541931213601541

262.

Wehbe

R. R.

Lank

Nacke

L. E.

(2017). Left them 4 dead: Perception of humans versus non-player character teammates in cooperative gameplay. In Proceedings of the 2017 conference on designing interactive systems (pp. 403–415). Association for Computing Machinery. https://doi.org/10.1145/3064663.3064712

263.

Wildman

J. L.

Nguyen

Thayer

A. L.

Robbins-Roth

V. T.

Carroll

Carmody

Addis

(2024). Trust in human-agent teams: A multilevel perspective and future research agenda. Organizational Psychology Review, 14(3), 373–402. https://doi.org/10.1177/20413866241253278

264.

Wildman

J. L.

Thayer

A. L.

Rosen

M. A.

Salas

Mathieu

J. E.

Rayne

S. R.

(2012). Task types and team-level attributes: Synthesis of team classification literature. Human Resource Development Review, 11(1), 97–129. https://doi.org/10.1177/1534484311417561

265.

Wong

Dudek

(2019). Investigating trust factors in human-robot shared control: Implicit gender bias around robot voice. In 2019 16th conference on computer and robot vision (CRV), Kingston, QC, Canada, 29–31 May 2019, pp. 195–200. https://doi.org/10.1109/CRV.2019.00034

266.

Wright

J. L.

Chen

J. Y.

Barnes

M. J.

(2018). Human–automation interaction for multiple robot control: The effect of varying automation assistance and individual differences on operator performance. Ergonomics, 61(8), 1033–1045. https://doi.org/10.1080/00140139.2018.1441449

267.

Wright

J. L.

Chen

J. Y.

Barnes

M. J.

Hancock

P. A.

(2016a). Agent reasoning transparency’s effect on operator workload. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 60(1), 249–253. https://doi.org/10.1177/1541931213601057

268.

Wright

J. L.

Chen

J. Y.

Barnes

M. J.

Hancock

P. A.

(2016b). The effect of agent reasoning transparency on automation bias: An analysis of response performance. In Virtual, augmented and mixed reality: 8th international conference, VAMR 2016, held as part of HCI international 2016, Toronto, Canada, July 17-22, 2016. Proceedings 8 (pp. 465–477). Springer. https://doi.org/10.1007/978-3-319-39907-2_45

269.

Wright

J. L.

Chen

J. Y.

Quinn

S. A.

Barnes

M. J.

(2013). The effects of level of autonomy on human-agent teaming for multi-robot control and local security maintenance. Report No: ARL-TR-6724, Aberdeen Proving Ground: Army Research Laboratory.

270.

Wright

J. L.

Lakhmani

S. G.

Chen

J. Y. C.

(2022). Bidirectional communications in human-agent teaming: The effects of communication style and feedback. International Journal of Human-Computer Interaction, 38(18), 1972–1985. https://doi.org/10.1080/10447318.2022.2068744

271.

Wright

J. L.

Quinn

S. A.

Chen

J. Y.

Barnes

M. J.

(2014). Individual differences in human-agent teaming: An analysis of workload and situation awareness through eye movements. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 58(1), 1410–1414. https://doi.org/10.1177/1541931214581294

272.

Wright

M. C.

Kaber

D. B.

(2003). Team coordination and strategies under automation. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 47(3), 553–557. https://doi.org/10.1177/154193120304700363

273.

Wright

M. C.

Kaber

D. B.

(2005). Effects of automation of information-processing functions on teamwork. Human Factors, 47(1), 50–66. https://doi.org/10.1518/0018720053653776

274.

Dudek

(2016). Maintaining efficient collaboration with trust-seeking robots. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea (South), 09–14 October 2016, pp. 3312–3319. https://doi.org/10.1109/IROS.2016.7759510

275.

Yuan

Zheng

(2025). Bidirectional transparency in human-agent communications: Effects of direction and level of transparency. Ergonomics, 1–19. https://doi.org/10.1080/00140139.2025.2456535

276.

Hong

Soria Zurita

N. F.

Gyory

J. T.

Stump

Nolte

Cagan

McComb

(2023). Adaptation and challenges in human-AI partnership for the design of complex engineering systems. In International Design Engineering Technical Conferences and Computers and Information in Engineering Conference (Vol. 87318, p. V03BT03A056). American Society of Mechanical Engineers.

277.

Yang

X. J.

Dong

Helander

(2012). The analysis of knowledge integration in collaborative engineering teams. Journal of Engineering Design, 23(2), 119–133. https://doi.org/10.1080/09544828.2011.567979

278.

Yen

Fan

Sun

Hanratty

Dumer

(2006). Agents with shared mental models for enhancing team decision makings. Decision Support Systems, 41(3), 634–653. https://doi.org/10.1016/j.dss.2004.06.008

279.

Berkovsky

Taib

Zhou

Chen

(2019). Do i trust my machine teammate? An investigation from perception to decision. In Proceedings of the 24th international conference on intelligent user interfaces (pp. 460–468). Association for Computing Machinery. https://doi.org/10.1145/3301275.3302277

280.

Zhang

Chong

Kotovsky

Cagan

(2023a). Trust in an AI versus a human teammate: The effects of teammate identity and performance on human-AI cooperation. Computers in Human Behavior, 139, 107536. https://doi.org/10.1016/j.chb.2022.107536

281.

Zhang

Duan

Flathmann

Mcneese

Freeman

Williams

(2023b). Investigating AI teammate communication strategies and their impact in human-AI teams for effective teamwork. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW2), 1–31. https://doi.org/10.1145/3610072

282.

Zhang

Duan

Flathmann

McNeese

Knijnenburg

Freeman

(2024). Verbal vs. visual: How humans perceive and collaborate with AI teammates using different communication modalities in various human-AI team compositions. Proceedings of the ACM on Human-Computer Interaction, 8(CSCW2), 1–34. https://doi.org/10.1145/3686976

283.

Zhao

Simmons

Admoni

(2022). The role of adaptation in collective human–AI teaming. Topics in Cognitive Science, 17(2), 291–323. https://doi.org/10.1111/tops.12633

284.

Zhu

Wang

Quan

Tang

(2023). Complexity-driven trust dynamics in human–robot interactions: Insights from AI-enhanced collaborative engagements. Applied Sciences (Switzerland), 13(24), 12989. https://doi.org/10.3390/app132412989

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.38 MB