Abstract
This study develops a team classification scheme for human-agent teaming (HAT) and, based on this, analyzes 25 testbeds utilized in 68 empirical studies on HAT. The team classification scheme, adapting an existing scheme used for human-human teams, consists of nine dimensions, including team composition, task interdependence, role structure, leadership structure, authority differentiation, communication structure, communication direction, communication medium, and team life span. This scheme was then applied to analyze 25 testbeds utilized in 68 empirical studies. We found that a significant portion of existing literature on HAT focused on teams consisting of one human and one agent, with humans typically assuming leadership roles. Moreover, the dynamics within these teams tended to remain static over time. Our findings highlight the importance of further research into diverse team attributes, such as team composition, leadership structure, and communication structure. Such efforts would facilitate a deeper understanding of complex team dynamics in human-agent teaming.
Introduction
Human-agent teaming has emerged as a topic of significant research interest across multiple disciplines, including human factors, computer science, and robotics. Existing studies of human-agent teaming employed different types of testbeds, most of which were developed and utilized by the research teams. Accordingly, these testbeds display distinct characteristics, covering various task domains, goals, and configurations of team members. For example, the MIX (Mixed Initiative eXperimental) testbed, referenced in numerous studies (e.g., J. Y.Chen et al., 2013), incorporates a RoboLeader that is capable of collecting information from subordinate robots that have limited autonomy. Within this testbed, a human operator is responsible for detecting targets using the subordinate robots and rerouting with the assistance of the RoboLeader. Another commonly used testbed, CERTT (Cognitive Engineering Research on Team Tasks), simulates a team of three members with specific roles (i.e., navigator, photographer, pilot). In this testbed, all of the team members need to collaborate effectively to achieve the team objective of capturing ground targets (e.g., Cooke et al., 2016).
These testbeds have facilitated research into evaluating various aspects of human-agent teams including team performance, workload, and trust. Also, drawing inspiration from the extensive research on human-human teams, an increasing number of studies are beginning to focus on how team characteristic-related variables affect human-agent team dynamics. There is a growing body of research comparing team performance across different team configurations (e.g., Fan et al., 2005; Harriott et al., 2012).
However, there is a notable gap in the comprehensive understanding of the available testbeds. Detailed knowledge about the features, distribution, and differences and similarities between testbeds is lacking. While a systematic review of previous empirical studies on human-agent teaming has outlined the dependent and independent variables used in the literature (O’Neill et al., 2022), a testbed-oriented analysis that illustrates team characteristics of each testbed is absent.
Addressing this gap is crucial for researchers to identify areas that require further exploration and development. Specifically, an overview of the testbeds used in empirical studies would enable researchers to comprehend the types of testbeds that have been employed and envision new testbeds that could tackle unresolved research questions. In response to this need, we conducted a literature review on human-agent teaming testbeds featured in prior research. Initially, we created a team classification scheme to analyze human-agent teams, adapting an existing classification scheme used for human-human teams. This scheme was then applied to analyze 25 testbeds utilized in 68 empirical research studies.
Our study not only identified the distribution of existing human-agent teaming testbeds but also highlighted areas of research that need further investigation. For instance, it was found that a body of literature on human-agent teaming focused on teams comprising one human and one agent, with humans typically in a leadership role. Moreover, the dynamics within these teams tended to remain static over time. Our findings emphasize the importance of further research into diverse team attributes, such as team composition, leadership structure, and communication methods. Such research efforts would facilitate a deeper understanding of more complex team dynamics in human-agent teaming, potentially leading to more effective human-agent collaborations.
Methods
Development of Team Classification Scheme for Human-Agent Teams
We first developed a team classification scheme for analyzing human-agent teams. The scheme was derived from a comprehensive set of team-level characteristics in the context of human teams proposed by Wildman et al. (2012), which we modified to suit the context of human-agent teaming. Subsequently, we applied the taxonomy to analyze 25 testbeds identified in a recent systematic review of human-agent teaming literature (O’Neill et al., 2022).
Wildman et al. (2012) defined six fundamental characteristics for human teams: task interdependence, role structure, leadership structure, communication structure, physical distribution, and team life span. In order to adapt this framework for our human-agent team classification scheme, we made several important modifications. We kept five of these characteristics—task interdependence, role structure, leadership structure, communication structure, and team life span—as they are highly relevant to human-agent teams. However, we excluded the attribute of physical distribution, given our focus on virtual agents.
Among the five attributes, we made two modifications—adding one additional category each for the leadership structure and communication structure to better explain human-agent teams. First, we introduced a none type for the leadership structure attribute. This type is relevant in the context of human-agent teaming, since there are instances where neither a human nor an agent team member is responsible for the leadership role during the experiment (Frieder et al., 2021; Hoffman & Breazeal, 2007). For the communication structure attribute, we added a dyadic type. This is because, in human-agent teaming literature, there are many scenarios involving a team comprising one human and one artificial agent (e.g.,Hanna et al., 2015; Harriott et al., 2012). In such cases, since there are only two members in the team, all communication occurs through a dyadic interaction between the human and the agent.
Moreover, to better adapt the framework to Human-Agent Teams (HATs), we incorporated four new attributes: team composition, authority differentiation, communication direction, and communication medium. In HATs, teams can be broadly defined by the number of humans and agents that form them. Thus, we included the team composition attribute. Regarding the leadership structure, it can be further analyzed in terms of who has the authority to make final decisions in situations such as disagreements (Hollenbeck et al., 2012). In HATs, we propose that there are four possible categories for the authority differentiation attribute, depending on whether it is humans, agents, both, or neither that has the authority. Additionally, in human-agent teaming literature, the communication dynamics between humans and agents are of interest. Henceforth, we added the attributes of communication direction and communication medium to further analyze the flow and form of interactions between humans and agents. Consequently, the HAT taxonomy now includes nine distinct attributes (refer to Table 1).
Team Classification Scheme for Human-Agent Teaming.
Analysis of Human-Agent Teaming Testbed
In this study, we referenced the review by O’Neill et al. (2022) to identify relevant empirical studies on human-agent teaming. However, our research diverges from O’Neill et al. (2022)’s in that we aim to analyze team characteristics that are featured in the existing testbeds. Consequently, we excluded studies that did not provide sufficient details about testbeds, particularly in terms of how humans interact with agent team members to achieve a goal. This led to the exclusion of five testbeds from eight research studies in our analysis. After the exclusion, we analyzed a total of 25 testbeds featured in 68 empirical research studies using the team classification scheme (Table 2).
List of Testbeds.
It should be noted that some testbeds have been used repeatedly across various studies, with CERTT II (Cooke et al., 2016; Demir et al., 2017) and MIX + RoboLeader ( J. Y. C.Chen et al., 2011; J. L.Wright et al., 2016) being the most frequently cited, each appearing in 15 studies. However, most testbeds are featured in one or two studies. For details on the frequency of each testbed, please see Table 2.
The testbeds reviewed in this study aim to simulate a range of domains and objectives. Notably, a considerable number of these are designed around military contexts, including sectors like the army, aviation, and navy. The primary tasks in these testbeds often involve target search and elimination, navigation, and route planning. For example, the MIX + RoboLeader testbed involves human operators who oversee unmanned vehicles to locate targets. In the TANDEM testbed, the team’s objective is to efficiently share information to decide whether to engage with or clear each target. There are also some testbeds that present relatively unique tasks. For example, in BW4T, participants must collaboratively deliver blocks in a specific sequence, and in the Simulated Factory, the goal is to assemble 10 carts using various parts as efficiently as possible.
Throughout the coding process, the researchers carefully analyzed the explicit explanations and illustrations provided in the papers. Rather than making inferences about the capabilities and underlying algorithms of computer-programed agents, we focused on documenting the specific tasks assigned to both human participants and agents, as well as their actions during the experiments. When the team composition attribute served as an independent variable (for instance, in comparing the performance of all-human teams to those with artificial agents, as discussed in Fan et al. (2005)), we exclusively considered and documented conditions where at least one agent was present. Moreover, regarding the team life span attribute, we defined it as the duration from the beginning to the end of the experiment. Two researchers conducted the review and analysis of each testbed independently. Subsequently, the results were compared and when disagreements were found, all authors engaged in a discussion and shared their perspectives until they reached a full consensus. After the coding process, descriptive analyses were conducted to explore the distribution of the existing testbeds across each team characteristic attribute outlined in the team classification scheme.
Results
The frequency analysis results mainly discuss the distribution of human-agent team characteristics featured in previous literature. It was found that certain team types are predominant, highlighting the need for investigating new forms of human-agent teams.
Specifically, regarding team composition, the majority of testbeds consist of single human participants paired with few agents that usually serve subordinate roles. Among the 25 testbeds, 44% (n = 11) of them are designed as environments where a participant interacts with one agent, referred to as one-to-one (1-1) configuration. The second most common form is one-to-many (1-M) configuration (n = 9), where a human participant completes a task with support from multiple agents. However, for some testbeds that have multiple agents (1-M or M-M), it should be noted that a lot of these agents in a team are subordinate autonomous vehicles that have limited impact and engagement - they passively carry out assigned tasks without actively interacting with other team members (e.g., Chien et al., 2016; Mercado et al., 2016).
Regarding task interdependence, about half of the testbeds (n = 13) are identified as intensive, indicating a high level of interdependence among team members. The next most frequent type is reciprocal, which aligns closely with a one-to-one (1-1) team composition. In this category, although there is significant interdependence, interactions are confined to a one-on-one basis between two team members, without involving multiple members at once. Additionally, four testbeds are categorized as sequential, in which team members’ activities unfold in a step-by-step process similar to an assembly line, with each member’s actions depending on the completion of the preceding member.
Concerning role structure, more testbeds (n = 16) have been classified as functional. This type refers to the type of role assignment where each member of a team has distinct roles that are not interchangeable.
The results also suggest a commonality in leadership structure and authority differentiation. In most human-agent teaming testbeds, the leadership structure is either designated with humans assuming leadership roles (n = 13), or none (n = 10).
Furthermore, the analysis of communication structure reveals that the most common type is the dyadic structure, since all 11 testbeds with the team composition configuration of 1-1 imply the dyadic communication between the two members. Among these 11 testbeds, the majority (n = 10) involves a bidirectional communication direction between the human and the agent. Six testbeds are coded as hub-and-wheel, and it should be noted that in most of these testbeds (n = 5), one agent provides decision aid to a human, while all the other agents follow commands from the human, representing the bidirectional information exchange between the human and agents. No testbeds are classified as having a chain structure. For communication medium, a lot of testbeds fall into control-based category, where humans have to communicate with agents by operating devices such as a computer keyboard and mouse. Finally, with regard to team life span, all reviewed testbeds focus on long-term teams.
Discussion
We conducted a comprehensive literature review on human-agent teaming testbeds featured in previous research. Firstly, we developed a team classification scheme for human-agent teams by adapting an established human-human team classification scheme. Subsequently, we used this scheme to analyze 25 testbeds that were utilized in 68 empirical research studies.
Our proposed team classification scheme comprises nine attributes: team composition, task interdependence, role structure, leadership structure, authority differentiation, communication structure, communication direction, communication medium, and team life span. The scheme is significant as it facilitates a comprehensive and holistic understanding of team characteristics and dynamics in human-agent teaming.
The coding results of the existing testbeds based on the scheme provide valuable insights, especially for researchers in this field, on areas that require further investigation and advancement. The frequency analysis results suggest that certain types of teams are predominant, indicating that new types of teams need to be designed. Specifically, in terms of team composition, the existing testbeds lack configuration of multiple humans and multiple agents working together (M-M). Designing more complex teams with diverse types of agents would enable the exploration of novel team interaction and dynamics.
The results regarding leadership structure suggest that different forms of leadership can be explored to diversify team dynamics and performance. For instance, it would be intriguing to investigate team dynamics and performance when an agent takes on the leadership role or when humans and agents cooperate as co-leaders. Larger teams could offer opportunities for studying more complex hierarchies and power dynamics among team members.
The communication dynamics are also open for further exploration, especially with a bigger team size. In addition, in terms of communication medium, broadening the range of communication methods beyond control-based to include speech, haptic, and gesture-based interactions would enable richer team communication.
Finally, the fact that all 25 testbeds are classified as long-term highlights the limited exploration of team dynamics within ad-hoc teams, which are prevalent in real-world scenarios. This opens avenues for studying how human-agent team dynamics evolve as the team goes through multiple short-term tasks with changing team configurations. Overall, future studies in human-agent teaming could address a variety of new research questions by experimenting with different types of human-agent teams. Developing new testbeds that offer the ability to modify various aspects of team characteristics could significantly enhance research in this area. Our work is crucial for discussing future research ideas when developing new testbeds or designing novel experiments by leveraging existing ones.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This paper is based upon work supported by National Science Foundation under Grant No. 2045009.
