Abstract
Extreme teams (ETs) work in challenging, high pressured contexts, where poor performance can have severe consequences. These teams must coordinate their skill sets, align their goals, and develop shared awareness, all under stressful conditions. How best to research these teams poses unique challenges as researchers seek to provide applied recommendations while conducting rigorous research to test how teamwork models work in practice. In this article, we identify immersive simulations as one solution to this, outlining their advantages over existing methodologies and suggesting how researchers can best make use of recent advances in technology and analytical techniques when designing simulation studies. We conclude that immersive simulations are key to ensuring ecological validity and empirically reliable research with ETs.
Keywords
“Extreme teams” (ETs) operate in challenging environments in which there are considerable physical, psychological, and interpersonal demands (Manzey & Lorenz, 1998). ETs share many similarities with “High Reliability Organisations,” in which teams are required to operate effectively, in complex task environments, and for sustained periods of time (Klein et al., 1995; Roberts, 1990). What both contexts have in common, and what defines an ET, is that they operate in atypical environments (in terms of demands/stress levels), in which ineffective performance can have severe, potentially life or death, consequences (Bell et al., 2018). Examples of ETs include those involved in long-duration space flights (Zhang et al., 2018), submarine command and control rooms (Stanton & Roberts, 2018), medical emergencies (Klein et al., 2006), high-risk industries (Sneddon, Mearns & Flin, 2006), and emergency response (Power & Alison, 2017a). Interest in ETs is increasing (see Driskell et al., 2018; Roma & Bedwell, 2017), with teamwork viewed as a vital component to organizational success and safe working practices (Hughes et al., 2016; Mazzocco et al., 2009). This has led to a consideration of how to study these unique, often hard to reach teams, to conduct rigorous applied research that contributes to wider theoretical understanding (Bell et al., 2018; Kozlowski, 2015). Given the unique context in which ETs operate, this understanding may diverge from what we know about conventional teams and challenge our current thinking. We identify immersive simulations as one way to achieve this and present a framework for designing, conducting, and analyzing this research, drawing on current research and ethnographic experience.
Researching ETs
Teamwork is essential for safety and success in extreme environments (Hughes et al., 2016). For example, research in high-risk industries has shown that accidents occur more often due to problems between team members than unsafe working conditions (Dwyer & Raftery, 1991), a finding that has been attributed to issues around poor leadership (McCabe et al., 2008) and a lack of team spirit (Kadiri et al., 2014). Risser et al. (1999) also showed from 54 incidents across 8 U.S. hospital emergency departments that half of all recorded deaths and permanent disabilities could have been prevented through better teamwork. Identifying solutions to improve teamwork in ETs can be challenging. This is because they have complex team structures, often form (and dismantle) rapidly, draw on multiple agencies, and operate in dynamic conditions that impose a high level of stress on members due to the severe consequences of poor teamwork (Crichton et al., 2000; Schmutz et al., 2018). These features are different from what we see in conventional teams and suggest that theoretically, their processes may be structured differently.
Research on teams requires careful consideration of the complex interplay between performance and its antecedent factors that reside at four levels: the individual (e.g., personality), the team (e.g., team structure: horizontal or vertical), cultural (e.g., organizational culture), and contextual (e.g., task demands). Each of these levels, in isolation and in combination, influences how well a team adapts and responds to a situation. When applied to ETs, an extra layer of complexity is added when we consider the extent to which psychological pressures (e.g., stress) interact with each of these levels and alters team performance (Driskell et al., 2018). The experience of stress can create a perception that task demands exceed available resources, which can lead to undesirable physiological, psychological, behavioral, and/or social outcomes (Salas et al., 1996). These demands may reside in conventional teams to a lesser extent (or not at all), or in a qualitatively different way (e.g., relating to performance rather than the loss of life). Differences in contextual demands can drive the type of stress experienced in teams, which may change or amplify the drivers of effective teamwork (Driskell et al., 2018; Maynard et al., 2018). Considering this, researchers have called for empirical research with ETs to test if theoretical models developed with conventional work teams apply to those working in these challenging settings (Vessey & Landon, 2017), and to develop solutions that can protect workers and enhance performance (Power, 2018).
Simulation research with ETs
Researchers looking at ETs have employed a variety of methods to understand their composition, function, and processes. When the research question concerns a descriptive understanding of ETs, qualitative methods such as observations and interviews (used in isolation or together) have been shown to be effective. Gillespie et al. (2013), for example, developed an ethnographic account of surgical teamwork culture using observations and interviews. Power and Alison (2017b) identified nine core challenges for commanders during emergencies using interviews. When the research question concerns the influence of self-perceptions on teamwork, self-report measures such as questionnaires have been used. Wauben et al. (2011) found differences between medical team members in the way they perceived nontechnical skills (e.g., communication and situation awareness) using a questionnaire survey. However, what these studies do not do, and what is distinct in simulation studies, is manipulate specific variables to test theory and generate empirical evidence of how these variables influence team performance. While the manipulation of variables is possible in traditional laboratory studies, these studies often utilize student samples in a setting that is void of the stressors present in an extreme environment (e.g., Zaccaro et al., 1995). Further, research highlights the importance of expertise in extreme environments (Boulton & Cole, 2016), thus suggesting that understanding how practitioners work in the real world necessitates that research is undertaken with the population of interest.
One effective method for studying ETs is simulations. Simulations allow for the measurement of complex relationships between factors that impact team performance in a meaningful organizational context, while facilitating a high level of experimental control (Alison, van den Heuvel, et al., 2013; Manser et al., 2007). Example relationships may include the impact on the performance of individual differences (e.g., attitudes), trust between team members, temporal patterns in teamwork over time, and cultural and contextual variables that may moderate these relationships, such as organizational norms and task demands. Studies that have used simulations to answer such questions include Bienefeld and Grote (2014) who showed the influence of expertise and organizational knowledge on leadership behaviors in aviation teams; and Amacher et al. (2017) who demonstrated that all-female medical teams showed less “hands-on” time and a greater delay before chest compressions in comparison to all-male teams.
In comparison to alternative methods, simulations have five key benefits: (i) recreate the stressors and challenges of the workplace, (ii) involve data collection with the population of interest (i.e., practitioners instead of students), (iii) provide an opportunity for researchers to test theory by manipulating and measuring discrete variables, (iv) allow for the collection of rich quantitative and qualitative data related to team behavior in real time, and (v) can be used as a training tool to increase participation (Rosen et al., 2008). Simulations are an especially useful platform for collecting data with ETs as they provide a physiologically and psychologically safe space that will not endanger participants (Alison, van den Heuvel, et al., 2013), while eliciting similar behavioral patterns as would be found in situ (Manser et al., 2007). They are also suited to research with ETs who may be difficult to study using alternative methods (e.g., the security sensitive nature of military command control would negate an observation study).
This article has two main aims. First, it seeks to show the utility of immersive simulations in studying a range of ETs; not just those who operate in health care, where many of the frameworks and benefits of utilizing immersive simulations originate (see Cheng et al., 2016; Cheng et al., 2014). We will show in this article that they can also be in contexts where ETs are less well structured (e.g., multiteam systems (MTS)), more fluid (e.g., nonstable team members), and involve both horizontal (i.e., within an operational team) and vertical (i.e., between operational, tactical, and strategic teams) organizational structures. Second, this article will outline recent technological and analytical advances in psychological research and consider how simulation research can be improved by utilizing more immersive methods that can better harness these advances. For example, by considering in what way emerging virtual reality (VR) technologies or alternative statistical approaches (i.e., Bayesian statistics) might be used to allow advanced models of ETs to develop. These developments have implications beyond the ET context and hold promise for team research in general. In this article, we address these aims by outlining a framework for using immersive simulations for research with ETs, broadly focusing on three aspects of the research lifecycle: (i) simulation design, (ii) data collection, and (iii) data analyses.
Simulation design
A simulation seeks to create a testing environment that closely replicates reality (Sleeper & Thomspon, 2008). An important consideration during research design is how to embed fidelity and immersion so that participants feel engaged in the simulation and exhibit similar behaviors as would be found in situ. Fidelity and immersion are two interrelated constructs that seek to increase the sense of realism during a simulation (Alison, van den Heuvel, et al., 2013; Lester et al., 2017), and which determine the success of simulations. Fidelity is the extent to which the simulation matches the real-world environment (Maran & Glavin, 2003). This influences the level of immersion felt by the participant, defined as the “subjective impression that one is participating in a comprehensive, realistic experience” (Dede, 2009, p. 66). Fidelity can be created at the physical and psychological levels. Physical fidelity refers to the extent to which the simulation reflects the material aspects (i.e., a physical replica) of the working environment (Lester et al., 2017). It is based on the principle that the more similar the simulated task environment is to the real environment the greater the transfer of learning (Baldwin & Ford, 1998). Psychological fidelity refers to the degree to which the skills and behaviors necessary to complete organizational tasks are accurately represented in the simulated environment (e.g., does the task evoke a similar level of cognitive processing) (Bradley, 2006). Psychological fidelity is expected to elicit similar psychological processes necessary for real-world performance (Kozlowski & DeShon, 2004). The decision on whether to maximize physical fidelity, psychological fidelity, or both during research design is dependent upon the research questions of interest.
Physical fidelity is important when a level of “dexterity” is needed by the target population to complete the task (Dieckmann et al., 2007). It allows the transfer of procedural skills that might not be possible using psychological fidelity methods alone (Hochmitz & Yuviler-Gavish, 2011) and is especially important when the research question concerns an interplay between humans and hardware (e.g., does a new piece of kit promote faster teamwork?). Understanding the interplay between humans and hardware, referred to as a “sociotechnical system” (Baxter & Sommerville, 2011), is important for ETs as their context becomes increasingly digitized. ETs where this will be important include control room operators, flight crews, and emergency medical teams. For example, Stachowski et al. (2009) used an exact replica of a nuclear control room to study the adaptability of teams as they moved through the testing space, communicating and sharing information with colleagues while interacting with the electronic displays to rapidly find faults and implement changes to systems (Waller & Kaplan, 2018). Although essential for certain ETs (e.g., operational teams that need to interact with hardware), creating physical fidelity through physical replicas can be difficult as they are often expensive, take up a large amount of physical space, and are often not portable (Kozlowski & De Shon, 2004).
Psychological fidelity is important for researchers interested in studying nontechnical skills in ETs (e.g., trust, decision making, sensemaking), or teams operating at strategic levels. It allows for the examination of the interplay between individual and contextual factors on intrateam processes (Kozlowski & DeShon, 2004). For example, researchers interested in the effects of psychological stressors (e.g., task-related anxiety) on team communication and coordination might build reactionary consequences into the simulation design to increase the gravity of decisions and sense of accountability of decision makers (Eyre et al., 2008). This might be achieved by gathering team members round a board room style table and providing them with real-time information that follows a realistic narrative to an unfolding situation (e.g., video calls from simulated team members, PDFs with “data” related to the simulation exercise). An example of where this has been used successfully is Power and Alison (2017a). They ran a simulation study examining how a team of emergency service commanders made decisions during a simulated terrorist incident in which different injects were presented to team members depending on their answer during the previous inject. This enabled participants to feel immersed by embedding consequences for choices, increasing the gravity of decision making.
Recent advancements in VR software offer an accessible and highly immersive way to achieve both physical and psychological fidelity. VR are “computer-generated simulations of three-dimensional objects or environments with seemingly real, direct or physical user interaction” (Dionisio & Gilbert, 2013, p. 2). They offer an affordable alternative to physical replicas of the organizational environment, while still testing important teamwork processes in a context that mirrors the decisions and challenges present in the workplace (Pan & Hamilton, 2018). VR simulations can therefore be used to test both operational (e.g., physical tasks) and strategic teamwork (e.g., decision making).
One example of a VR system is the Cave Automated Virtual Environment (CAVE). CAVE comprises an enclosed cube, sitting within a large darkened room with projectors on each side (Cruz-Nierra et al., 1992). CAVE is attractive as the goggles that are worn do not stop participants from seeing their own hands (as with most head-mounted VR devices), while they interact with the VR projected on the screens. This means that participants can interact with physical objects (e.g., enact driving by using a real steering wheel) (Pan & Hamilton, 2018), allowing researchers to examine the ability of teams to perform physical tasks. This is especially important when researching ETs that are required to complete arduous physical tasks (e.g., search and rescue teams) and may offer some insights into how contextual demands can influence team members’ ability to use specialist equipment. For example, CAVE has been used to train firefighters using Breathing Apparatus Entry search methods—searching a building for casualties in which sight and breathing is restricted by smoke (Backlund et al., 2007). In their study, participants wore personal protective equipment, and sensors were fixed to the walls so that physical movements within the “CAVE” corresponded to their movement and orientation within the simulation. This increased the physical effort needed to complete the tasks, giving participants a sense of real-world orientation while in a virtual world.
The use of CAVE is not widespread generally (see Jiang et al., 2016), and this is especially so in relation to ETs. This may be attributed to the fact that it is relatively expensive and difficult to transport in comparison to other VR systems such as head-mounted displays (Mallaro et al., 2017). However, evidence from other areas has shown its potential utility for understanding ETs. Gamble et al. (2018) utilized the CAVE system to explore friend/foe discriminatory fire in military personnel, where they found that participants made more errors when under stress, but that “expertise” was a protective factor. There is also evidence from its use in social psychology that it may be used to explore the role of social influence on individual behavior. For example, Kinateder et al. (2014) showed that the presence of a virtual agent significantly affected route choice in the evacuation of a tunnel fire. Applied to ETs, the potential for unpacking social influence suggests that CAVE may help develop our understanding in areas such as how intra- and interteam communication influences performance in MTS. At present there is limited understanding of how behaviors at the intrateam level affect interteam performance and vice versa (Asencio & DeChurch, 2017). In the immediate term, and commensurate with the current capability of this kit, we would expect ET research utilizing this technology to focus on questions that do not require data to be collected from multiple team members in parallel. In the longer term, and as this technology advances, we see potential for the CAVE system to study the interaction between multiple individuals, in addition to its current capability of studying the interaction between participants and virtual agents.
When designing a simulation, the involvement of practitioners and/or experts is invaluable. They can ensure that the simulation is relevant to organizational tasks (Klein & Woods, 1993), provide expert input about the task environment and narrative, increase the likelihood that the simulation will elicit similar cognitive and emotional responses found in the real world (Crandall et al., 2006), and help to ensure that simulations offer both research and training benefits, which can facilitate participant recruitment and engagement (Rudolph et al., 2007; Waller & Kaplan, 2018). This makes simulations attractive to end users as they provide a space in which team skills can be trained, facilitating recruitment that might otherwise be challenged by the high workloads and small populations of participants (Beaubien & Baker, 2004).
However, it is important to ensure a balance is met between research and training goals. Simulations can be resource intensive and it is important that researchers are not prevented from collecting the data they need to answer their research questions and that practitioners are not promised a simulation that fails to meet their training objectives. To do this, researchers must delineate what the training goals of the organization are during the early phases of design, and work around them to ensure training objectives are compatible with research goals (Dieckmann et al., 2007). This should facilitate an interdisciplinary partnership and enable collaboration through the entire research project. The involvement of practitioners at the early stages of research can also have benefits later on in terms of research dissemination and impact. Practitioners are keen to receive feedback on their training; as such, a research team might want to organize a feedback workshop or write a practitioner-friendly report on findings. This can facilitate opportunities for further follow-up studies and ensure a collaborative relationship with practitioners moving forward.
Data collection
A key benefit to simulation research is that it facilitates the collection of rich behavioral data, allowing researchers to study the verbal and nonverbal dynamics of teamwork. Psychology has seen a decline in the use of behavioral measures in recent years, typically showing a tendency to use self-report surveys (Cialdini, 2009; Dolinski, 2018). However, there has been a general call to move beyond self-report measures to gain a better understanding of how social coordination emerges in complex environments (Willemsen-Dunlap et al., 2018) and to develop more objective measures of behavior (Rosen & Dietz, 2017). This is due, in part, to the limitations of solely using self-report measures which (i) fail to account for the richness of team-based interactions (Shuffler & Carter, 2018), (ii) lead to a proliferation of scales each attempting to measure the same thing (see Salas et al., 2015, on team cohesion), (iii) show weak correspondence with non-self-report outcome measures (see Valentine et al., 2015, for a review in a health-care setting), and (iv) are subject to a number of biases (e.g., Podsakoff et al., 2003). We suggest that simulations offer a methodological advantage to self-report by recording behavior in situ.
Wearable technology
The tools used to collect data during simulations need to be unobtrusive so as not to break immersion, but robust enough to allow for reliable examination of the research question. The advancement of behavioral measures creates promise for the use of wearable sensors in research using simulations. Wearable sensors are mobile devices that record data on how the wearer interacts with their surroundings (including other people). They do this using microphones, accelerometers, infrared sensors, and/or Bluetooth components (Chaffin et al., 2017). Wearable sensors have advantages over traditional methods, namely that they allow for the effortless recording of data from participants that are not reliant on self-reports and that data are real time and continuously collected thus removing the necessity for researchers to piece together static data taken at set times, sometimes from multiple devices. This makes wearable sensors especially suited to simulations, as the continuous collection of rich data in the real world may lead to consent and confidentiality issues (e.g., recording patient–clinician interactions).
The fact that behavioral data are collected continuously means that wearable devices have the potential to identify important within-person insights and their impact on team performance. This has not always been achieved with traditional methods, which tend to focus at the between-person level (Matusik et al., 2019). This finer grained understanding of how teams operate has the potential for simulation methods to develop complex, nonlinear, relationships between relational variables. For example, data from wearable sensors may allow for the development of a finer grained understanding of leadership in ETs, such as how a leader’s behavior fluctuates across an emergency and how these fluctuations impact behaviors. Similarly, it may examine how leadership changes interact with team factors (e.g., the presence of other teams—as within MTS) or external forces (e.g., contextual demands—during crises response).
At a theoretical level, wearable sensors are most valuable when the research question concerns relational issues at the team level (e.g., cohesion, trust, leadership), as they show how the person navigates their environment, including social interactions. In using data from single or multiple streams (e.g., audio, Bluetooth), studies have used wearable technology to examine affect and team cohesion in simulated space exploration missions (Zhang et al., 2018), cooperation (Taylor, 2013), communication in productive and creative teams (Pentland, 2012), social and task-related exchanges (Matusik et al., 2019), social networks (Wu et al., 2008), boundary-spanning individuals (i.e., those that coordinate activity between established groups) (Chaffin et al. (2017), and emergent leaders (Chaffin et al., 2017). There is potential for research in ETs to build on this to use sensors in the study of MTS, to explore how boundary-spanning individuals support teamworking across multiple agencies responding in crises. Previous research has tended to rely on self-report and coding of verbal behaviors (see Bienefeld & Grote, 2014), whereas wearables can measure other aspects such as variations in proximity over time (i.e., using Bluetooth), in addition to providing a continuous measure of communication.
Research using audio data more generally expands the potential for wearables in simulation research. For example, Stanton and Roberts (2018) used audio data to understand team-level macrocognition (i.e., cognitive functions that are performed in naturalistic settings, see Klein et al., 2003), Bowers et al. (1998) have used it to understand shared mental models, and Fischer, McDonnel and Orasanu (2007) have used it to examine which types of information (task or relational focused) best support performance in ETs. From the perspective of understanding ETs, this is especially promising as the nature of these environments means that team members have to share, analyze, and discuss complex information (e.g., Haddow & Bullock, 2003). An important question for ETs, due to the time-sensitive nature of their work, is how to do this efficiently. Evidence from a range of non-ETs suggests that short and equal verbal contributions, face-to-face communication, distributed connections within the team, and information seeking from other teams characterize success (Pentland, 2012). Wearable sensors would allow for a reliable test of these hypothesized effects in ETs, while maintaining the realism of the ET environment through the use of the simulation.
Wearable technology can be used to record physiological data from team members. Psychological pressures (e.g., stress) is an important factor to consider in simulation research of ETs as the inherently stressful environments they operate in can disrupt performance (Driskell et al., 2018). Stress has been shown to reduce communication in aviation teams (Sexton et al., 2000), impair cognitive functioning in military teams (Wallenius et al., 2004), and reduce information sharing in less experienced surgical teams (Wetzel et al., 2006). Although previous research has explored the role of stress in ETs, studies have often failed to check whether the experimental manipulation has actually affected stress levels or, alternatively, used a self-report survey to do so. For example, increasing stress by imposing time pressure has been associated with an increase in risk-taking behavior (Young et al., 2012) and a shift toward more satisficing decision styles (Alison, Doran, et al., 2013). However, neither of these studies took physiological measures of stress from their participants and so the effects of stress, via time pressure, were assumed.
Wearable technology allows us to address the limitations of these other studies. It is possible to measure stress during a simulation by using wearables that record “stress-related” measures, such as heart rate, galvanic skin responses, and change in pitch (Mozos et al., 2017). For example, stress during a simulated driving task, as measured using skin conductivity (i.e., sweating) and heart rate, has been found to predict stress levels with the highest level of accuracy when compared against physical indicators of stress (e.g., breaking and sharp turning) and self-report measures (Healey & Picard, 2005). Heart rate has also been identified as the best indicator of stress in a study comparing physiological indicators of stress during a simulated virtual environment that invoked fear by placing participants over a chasm at great height (Meehan et al., 2002). When applied to ETs, physiological indicators of stress open the possibility of building models that map team responses across a stress episode: from its origin through peak to end. What sets these models apart from conventional teams (where such devices are equally insightful) is the potential for ET models to overlay the stress episodes experienced by interrelated teams (e.g., MTS) to examine interplay or contagion.
In keeping with the need to maintain fidelity during simulations, researchers may also consider using physiological measures of stress to provide an objective indication of how immersive a simulation has been. Baker et al. (2017) used a heart rate monitor to assess if the stress experienced in medical procedures could be replicated within a simulated environment and found that the simulated procedure did not accurately recreate the same level of stress as experienced within hospitals. This emphasizes the need to incorporate a physiological measure of stress to ensure that elements of the simulation that are intended to be difficult induce a level of urgency within the participants. There is currently a lack of research that has sought to establish what stress levels are needed to ensure that simulations are useful for training and research purposes (Cumin et al., 2013). More research is needed to establish standardized levels of immersion which can leave organizations confident that simulations are achieving their intended purposes (Cumin et al., 2010).
Interactions within the simulation system
Although wearable sensors have the potential to provide rich data on relational issues at the team level, they may not be able to provide a holistic overview of teamwork, such as when communication occurs via other mediums (e.g., email) or when interdependent tasks are carried out in different locations. For example, some ETs (e.g., MTS) will operate across several sites and researchers may wish to explore how cultural factors (e.g., organizational policies) and team structures facilitate/hinder interteam processes. One benefit of simulations is that teams are operating in designated room(s), and so forms of data collection can be built into the simulation system to provide a comprehensive account of verbal and nonverbal communication between team members. Data gathered from participant interactions within the simulation system might include video recording, for example, CCTV of the team operating in the simulation room; or recording data within the simulation computer system itself, for example, by generating a log of clicks or button pushes when participants interact with the simulation; collecting time-stamped “decision logs”; and eye tracking on the computer screen. Monitoring the interaction within the simulation system may prove particularly important for researchers interested in designing a simulation with high physical fidelity to explore sociotechnical systems (e.g., how team members interact with the computer system). Future research could consider how simulations with high physical fidelity might advance theory on sociotechnical systems and their use by ETs. For example, in considering the role of the team in increasingly automated systems or in what way do contextual demands (e.g., dynamic task requirements) impact team members’ ability to effectively utilize technology in crises.
The type of data recorded in the simulation system will be dependent on the system being used and the research questions of interest. For example, research questions that are interested in how team-level factors (e.g., composition) influence decision speed might use a time-stamped “decision log.” Power and Alison (2017a) used this method to identify how long it took teams to make decisions and how this interacted with the team’s goal. Teams were requested to “log” their decisions on a computer when they wanted to make a decision and these data were automatically recorded and time stamped in the simulation system. Alternatively, researchers may use the simulation system to monitor how team members communicate electronically with one another. Alison, Power, van den Heuvel, Humann, et al. (2015) were interested in communication patterns between subteams in different “syndicate” rooms in a simulation. To do this, they built a “chatbox” function into the simulation system so that subteams could communicate between rooms, with all electronic communications data recorded and time stamped. The simulation system, therefore, offers an alternative mode of data collection that can be used in isolation or in conjunction with wearable devices dependent on the research question.
Data analysis
Simulation research with ETs has the potential to yield vast amounts of data from multiple sources, measuring multiple variables. It is important that data analysis maximizes understanding of this rich data. There exist a number of methods of analysis that can be used. Here, we focus on two types that are especially relevant: (i) network analyses, which examine interpersonal dynamics within a team at a single time point; and (ii) temporal analyses, which track interpersonal dynamics over time. We focus on these methods as they provide rich representations of team interactions, as opposed to assessing the individual performance of team-based skills (e.g., Yule et al., 2008). We then turn our attention to the possibility of using Bayesian statistics, which allow analyses to be carried out with smaller samples, and thus may open up the possibility of testing more complex theoretical models in ET research.
Network analyses
Network analyses allow a researcher to analyze team behavior during simulations by quantifying information and providing a visual representation of how team members interact. This type of analysis is especially useful when comparing how contextual factors (e.g., task type) influence team behaviors (e.g., interteam communication) (Stanton & Roberts, 2018). Using recorded communication data (e.g., by using wearable devices or CCTV recordings), social network analysis (SNA) shows how team members communicate with each other and the centrality of any one member (Knoke & Yang, 2008). SNAs are also useful as they provide a visual representation of the social dynamics of a team by plotting each person as a node and showing the strength of the connections between them. At a theoretical level, this is especially important for ETs that involve multiple agencies operating within a hierarchical structure as it can identify instances in which communication patterns do not follow predefined organizational processes and structures (Dekker, 2000), or plausible reasons for communication breakdowns. For example, SNA has been used to identify key tasks that challenged communication in submariners (Stanton & Roberts, 2018), how team communications varied dependent on team composition in surgical operating staff (Anderson & Talsma, 2011), and how a lack of connectedness between a search and rescue team contributed to faulty communications and the ability to develop shared situation awareness (Fodor & Flestea, 2016).
An alternative type of network analysis that goes beyond communications data is the Event Analysis of Systematic Teamwork (EAST) technique. This method models the macrocognition (i.e., situation awareness) of a team by generating task and information networks in addition to social networks (Walker et al., 2006). In order to perform EAST, raw data from audio and video recordings are transcribed and then used to create matrices of each of the three networks (i.e., social, task, information). This results in a “network of networks,” that allows researchers to identify how constructs in different networks might interrelate. For example, communications might influence the way a task is performed, which might influence how information is transferred.
EAST has been used to examine teamwork in simulation research across several extreme contexts: submariner command and control (Stanton & Roberts, 2018), emergency response (Houghton et al., 2006), and air traffic control (Walker et al., 2006). As EAST involves generating a task network, it is useful for researchers who are interested in understanding how team members coordinate to complete tasks as well as how they communicate with one another in extreme environments. Hierarchical task analysis is a methodology within EAST that is used to identify key tasks (Annett & Stanton, 2000), as well as the individuals who complete tasks, the structure, and the order in which the tasks take place (Walker et al., 2006). This provides a detailed representation of how team goals interact and are resolved (Walker et al., 2010). For example, a simulation researcher interested in team coordination may want to model how a team approaches different tasks depending on difficulty. As coordination is defined as the behavioral mechanism enabling teams to sequence, synchronize, and integrate their efforts in order to achieve goal-relevant tasks (Marks et al., 2001), modeling how teams move through tasks should contribute to a more complex understanding of how ETs coordinate. This is extremely relevant for researchers interested in ETs due to the importance of coordination in managing complex team structures and preventing error across a range of contexts such as aviation (Grote et al., 2010) and medical emergency teams (Schmutz & Manser, 2013).
Temporal analysis
Temporal analysis seeks to identify how team behavior might change over time in response to changes in individual, team, and contextual demands. This type of analysis is especially useful for ET researchers interested in exploring how team processes emerge and are sustained during simulated tasks. It recognizes the important role of context in shaping team-based interactions (Ilgen, 1999), emphasizing that teamwork does not exist in a vacuum and team processes will change over time (Kozlowski & Ilgen, 2006). Nonsimulation-based team research has sought to study how teamwork changes over time by collecting longitudinal data (e.g., questionnaires) at set intervals over a given period (see, Mathieu et al., 2015). However, this staged approach might not be feasible for some ETs as team members rotate and might not work together at set regular intervals (e.g., emergency response teams). Moreover, these approaches tend to rely on self-report data, as opposed to monitoring actual behavior in real time, which has limitations as detailed above (Shuffler & Carter, 2018).
An alternative approach is to study how team behavior evolves during a simulation. Although simulations will not produce “longitudinal” temporal data in the traditional sense (e.g., over a course of weeks/months/years), simulations offer a closer replica of how ETs operate in the real world, wherein they must adapt and evolve their teamwork during a given task (e.g., emergency incident). As such, simulations allow us to study the temporal dynamics of teamwork during a simulated “event,” which can incorporate multiple goal-directed tasks and episodes (Marks et al., 2001). By analyzing simulation data longitudinally (i.e., over the course of the simulation), researchers can explore how teams adapt and change as they cycle through different episodes within the simulated event (Marks et al., 2001). The advent of wearables and advancements here allows for this to be done in a reliable and highly detailed way, enabling researchers to begin examining complex, nonstatic theories or models of behavior. This could be especially important to advance understanding of MTS. For example, wearable devices may be used to measure communication and relational emerging variables such as cohesion across multiple component teams. When coupled with repeated SNA this would allow researchers to map how intra- and interteam behaviors and relationships change over time. This could answer questions such as how intrateam behaviors relate to interteam performance or how intrateam cohesion affects how interteam members relate to one another.
Beyond comparing networks analyses during different phases of a simulation, a more complex way of analyzing temporal data is by using lag sequential analyses, which seek to identify nonrandom patterns of behavior during a task (Becker-Beck, 2001). It is useful for research questions that seek to identify how specific team behaviors (e.g., shifts in communication patterns across team members) develop and change over time (Leenders et al., 2016) and how specific patterns of behavior can lead to better team performance (Kauffeld & Meyers, 2009). An example of how lag sequential analyses have been used to study ETs during simulations is Cohen-Hatton and Honey (2015). Their research sought to identify whether commanders in the Fire Service prescribed to the standard decision model used by the Fire Service, or whether they deviated. Participants were asked to “think aloud” (i.e., verbalize their thoughts) during a simulation, and transcripts were coded to identify if participants progressed through the prescribed model of “situation assessment” to “plan formulation” to “plan execution.” They found, using lag-sequential analyses, that participants did not follow this pattern. However, a simple goal-oriented training intervention made participants more likely to adopt the prescribed processing pattern, without delaying decision speed. Lag sequential analyses are thus useful for helping to understand patterns in team processing and behavior during a simulated event and also provides possibilities for testing interventions to increase adherence to decision models and/or improve performance. For example, using this technique, research might develop our understanding of how patterns of behaviors change depending on information flow, level of stress in team members (as measured using physiological markers), changes in goal hierarchies, and the interaction between these variables. In doing so, we would have an enhanced understanding of the temporal and contextual influences on teamwork in ETs.
Bayesian statistics
Another approach to analyzing data from ET simulations is by using Bayesian statistics. Unlike network and temporal analyses, Bayesian statistics are not a type of data analysis but are an alternative statistical approach to classic significance testing. Traditional research on teams often draws on classic significance testing (e.g., null hypothesis testing, p values, confidence intervals) to test specific variables and theories. However, this approach is problematic when working with ETs as, at a practical level, it often calls for moderate to large sample sizes with normal distributions (see Wagenmakers et al., 2018, for other problems with classic theory). Research with ETs tends to involve small sample sizes as the participant pool is much smaller than the general population and participants often have limited time to take part in research (Bell et al., 2018). While efforts to address this have drawn on using trainees from ETs, such as trainee paramedics (e.g., Amacher et al., 2017), these samples have been shown to operate differently to “experts” (Boulton & Cole, 2016). In other types of ETs (e.g., emergency response, command, and control), trainees may also not be as readily available as they are in clinical settings.
In response to problems with classic testing, researchers are calling for alternative methods of analysis (e.g., Vandekerckhove et al., 2018). One that has seen an increase in popularity—facilitated by advancements in computer algorithms and quicker hardware processing—are Bayesian statistics (e.g., see Special Issues in Journal of Mathematical Psychology 2016, vol. 72; Psychonomic Bulletin & Review 2018, vol. 25). As a set of tools, Bayesian statistics are attractive to ET simulation research as they open the potential, inter alia, for theoretical models to be tested even when samples are smaller than with conventional team research.
As a very broad (and somewhat simplified) overview, Bayesian statistics have the ultimate goal of showing the probability that the data observed are likely to occur under two competing theoretical (i.e., statistical) models (Kruschke & Liddell, 2018). Using Bayes factors, a researcher infers the level of support for their theory, relative to the alternative theory, based on how much the observed data differ from that predicted. This is done by comparing the statistical model against a “posterior” probability distribution, which is made up from prior information known before data were collected and what is known from the actual—observed—data. Prior knowledge can come from theoretical frameworks, findings from previous research, subject experts, and pilot work (Zyphur & Oswald, 2015). Research may also use noninformative priors where knowledge is limited and parameters are set to cover a broad range of possible outcomes, but this is less advisable when samples are small (see McNeish, 2016). Bayesian statistics regard parameters (e.g., probabilities) as variables, and as such, parameters are adjusted as data accumulate and output is compared against starting values. The researcher can thus see how evidence for their theoretical (statistical) model changes with new data; something that is not possible with classic theory where parameters are regarded as constant (see Gelman et al., 2014, for a statistical overview of Bayes analysis; Lynch, 2010, for a general introduction; Jeffrey, 1961, for original writings).
Classical significance tests require researchers to specify in advance what the smallest effect size of interest is given their theory in order to recruit a sufficient number of participants capable of detecting such an effect. Yet, it has been shown using Bayesian analyses that a high-powered nonsignificant result might not necessarily constitute evidence for the null and that low-powered nonsignificant results are not necessarily insensitive (Dienes & McLatchie, 2018). Evidence suggests that sample sizes estimated using parameters generated through Bayesian analysis rather than power may be more flexible and yield smaller sample size requirements (Sambucini, 2017). Relatedly, Bayesian analysis has the benefit of allowing for “optimal stopping.” In essence, this allows for a researcher to track results as data are collected and stop data collection when a certain level of evidence showing one theory as more favorable has been obtained (Kelley, 2013). In addition to allowing for potentially smaller samples to be tested to obtain an effect, this avoids the ethical issue of testing ET members beyond what is needed.
Bayesian analyses have been applied to a number of methods from t-tests through structural equation modeling (Brown et al., 2019; McNeish, 2016). For ETs, it could be applied to existing methods (e.g., using a t-test to compare two sets of SNA across phases of a simulation) to identify significant effects that may have been masked by small sample sizes. Depending on the complexity of the theoretical model, we may start to move toward unpacking the different pathways through which factors have an effect on team performance, and the conditions that moderate these effects. This could be especially important in understanding the complex interplay between component team and system-level variables in MTS, which linear approaches may not be able to account for (Cronin, 2015). As interest in larger multiagency teams expands, we may see the use of Bayesian methods grow as researchers seek to test theoretical frameworks that span multiple levels (i.e., variables at the component team and system level) that traditional statistical approaches would not have the power to do when working with small sample sizes (Wang & Hanges, 2011).
Conclusion
Teamwork is a necessity in almost any 21st-century organization, with teams increasingly viewed as the solution to solving complex problems (Salas et al., 2015). This is especially so in organizations operating in extreme environments, where team members must coordinate their behavior effectively in order to avoid the severe, often life or death, consequences of poor performance. In this article, we have identified the benefits to conducting simulation research with ETs, showing how they differ from existing methods. Second, we have presented a framework for conducting immersive simulations, focusing on three broad aspects: (i) study design, (ii) data collection, and (iii) data analysis. By doing this, we have reviewed existing simulation research, as well as suggested how emerging technologies (e.g., wearable devices, CAVE) and statistical methods (e.g., Bayesian) might be used in simulation research to advance understanding. It is hoped that this article will inspire researchers to make use of novel immersive simulation-based methods to engender the much-needed empirical research on ETs.
Footnotes
Authors’ Note
Olivia Brown is now affiliated to School of Management, University of Bath, Claverton Down, Bath, UK.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This work was funded by the Centre for Research and Evidence on Security Threats [ESRC Award: ES/N009614/1].
