Sage Journals: Discover world-class research

Abstract

Objective

Two studies serve as a manipulation check of a new experimental multi-task paradigm that can be applied to human-automation research (Virtual Reality Testbed for Risk and Automation Studies; ViRTRAS), in which a subjectively experienceable risk can be manipulated as part of a virtual reality environment.

Background

Risk has been postulated as an important contextual factor affecting human-automation interaction. However, experimental evidence is scarce due to the difficulty operationalizing risk in an ethical way. In the new paradigm, risk is varied by the altitude at which participants carry out the task, including the possibility of virtually falling in case of a mistake.

Method

Key components of the paradigm were used to investigate participants’ risk perception in a low (0.5 m) and high altitude (70 m) using subjective self-reports and objective behavioral measures.

Results

In the high-altitude condition risk perception was significantly higher with medium to large effect sizes. In addition, results of the behavioral measures reveal that participants habituated with length of exposure. However, this habituation seems to occur similarly in both altitude conditions.

Conclusion

The manipulation checks were successful. The new paradigm is a promising tool for automation research. It incorporates the contextual factor of risk and creates a situation which is more comparable to what real-life operators experience. Additionally, it meets the same requirements of other multi-task environments in human-automation research.

Application

The new paradigm provides the basis to vary the contextual factor of risk in human-automation research, which has previously been either neglected or operationalized in an arguably inferior way.

Keywords

situational risk human-automation interaction degree of automation multi-task paradigm virtual reality

INTRODUCTION

Consequences of function allocation are well investigated in human-automation research. When automation functions properly there is a clear benefit in human-system performance with an increased degree of automation (DOA; Onnasch et al., 2014; O’Neill et al., 2020). However, in the infrequent occasion of automation failure, performance seems to be more negatively affected with increased DOA as well (Onnasch et al., 2014). In search of the reasons for non-intended consequences, studies show that operators working with automation supporting the decision-making process, that is, a relatively high DOA, tend to invest fewer resources in automation verification and monitoring compared to interacting with a lower DOA that only aids information analysis (Mosier & Manzey, 2019). Consequently, it might be less likely for the operator to detect an automation error and take over control if needed during automation breakdowns (Parasuraman & Manzey, 2010). Despite a considerable amount of laboratory-based evidence, the transferability of these results to the real-world work context has been partly challenged (Jamieson & Skraaning, 2019, 2020; Wickens et al. 2020). A major point of criticism is the insufficient regard of contextual factors of the operators’ working environment. One of these contextual factors is risk. Risk is defined as the product of the severity multiplied by the probability of a negative event (German Institute for Standardization, 2010). While the severity represents the extent of damage which would emerge if the negative event arose, the probability constitutes the likelihood of this event. An increase of either probability or severity increases the risk. For many operators the work context bears a distinct risk as mistakes usually lead to negative consequences that could be catastrophic (e.g., in aviation, medicine, nuclear energy).

The relevance of situational risk is not new to the literature. It has been numerously postulated as one of the most important contextual factors (e.g., Mayer et al., 1995; Parasuraman & Riley, 1997; Hoff & Bashir, 2015; Mosier & Manzey, 2019; Stuck et al., 2021). However, cases in which risk was experimentally manipulated are scarce. Most likely, this stems from the difficulty of operationalizing a directly experienceable risk in an ethical way.

For example, the manipulation of risk has been attempted by framing the cover story of the experiment with different degrees of risk (e.g., Lyons & Stokes, 2012). Possible negative consequences are therefore hypothetical and the effect on participants’ behavior presumably depends on their level of immersion in the story. Another approach to a solution is the usage of pay-off matrices where participants receive a monetary reward dependent on their performance (e.g., Wiczorek & Manzey, 2014). This way, the situational risk entails a reduction of the (usually small) monetary reward. Moreover, risk has been operationalized by claiming that poor performance will prolong the experiment without additional compensation (e.g., Chancey, 2016). Thus, the risk derives from a possible waste of time. All these operationalizations affected the behavior of participants and enhanced their performance. Then again, it remains unclear if these hypothetical threats or rather inconveniences are indeed comparable to the situational risk of some real-life operators whose own physical integrity is at stake.

To address the problem of implementing risk in a laboratory environment in an ethical and at the same time more experienceable and self-referential approach, virtual reality (VR) could offer new possibilities. VR has previously been successfully used to safely expose participants to immersive environments which would usually be dangerous in real life (Ahir et al., 2019; Pallavicini et al., 2016). For example, Winslow et al. (2015) used a VR military simulation to expose participants to scenarios including the risk of being shot. An alternative to induce risk into VR is the use of altitude and the possibility of falling. Wuehr et al. (2019) have demonstrated that the variation of altitude in a VR environment has a distinct effect on subjective as well as objective metrics. In their study, the exposure to a high altitude induced a more pronounced subjective anxiety and fear rating compared to a condition on ground level. Moreover, it triggered analogous patterns of vegetative and postural responses induced by real altitude stimuli, such as a general stiffening of the musculoskeletal system or a generally increased body sway. The emotional and physical responses to altitude exposure in VR are similar enough to real life that it has been used as a tool for exposure therapy for the treatment of acrophobia (e.g., Coelho et al., 2009).

Consequently, the exposure to a high altitude in a VR environment combined with the possibility of falling presumably causes a more pronounced perception of risk compared to a simulated situation close to the ground.

Following this rationale, we developed a new VR multi-task paradigm for human-automation research called Virtual Reality Testbed for Risk and Automation Studies (ViRTRAS). In ViRTRAS, risk can be manipulated by setting the altitude at which the task is completed, including the possibility of falling during task completion in case of a mistake. This enables researchers to incorporate the contextual factor of risk as part of the experimental environment. In the first part of this paper, the development of ViRTRAS is described in more detail including the tasks and options for independent and dependent variables. This is followed by the investigation of the paradigm’s premise that the manipulation of altitude induces different perceptions of risk. Results of two validation studies are reported using a modified version of the paradigm which included the subset of features needed to investigate the impact of altitude on participants’ risk perception in ViRTRAS.

VIRTUAL REALITY TESTBED FOR RISK AND AUTOMATION STUDIES

Requirement Analysis

Before starting the development of ViRTRAS, a requirement analysis of other experimental environments that are often deployed in the context of automation research was conducted (e.g., AutoCAMS 2.0, Manzey et al., 2008; MATB-II, Santiago-Espada et al., 2011; Pasteurizer II, Reising & Sanderson, 2002). The analysis revealed that the following requirements must be met by the new paradigm: multiple tasks, adaptable stress on the operator, manual as well as automation supported conditions, and the ability to measure each participant’s performance and information sampling/monitoring. The incorporation of these requirements can be discerned from the following.

Implementation

Cover Story

The experimental paradigm is set in the not-so-distant future when mankind has started to build an information network in our galaxy. The participant slips into the role of an operator on the first prototype of a transmitter site on a different planet (Figure 1). The centerpieces of this are the oval so-called information clouds inside of long vertical transparent transmitter masts (Figure 1a). The information clouds frequently happen to crash and freeze because of the early-prototype character of the transmitter network. Once this happens the operator must travel in a small capsule (approximately the size of a small elevator; Figure 1b) to the location of the frozen information cloud to reset it by hand. To reach the information cloud from the capsule, a ramp is extended (Figures 1c and 2a) on which the operator can walk towards and manually reset the information cloud. The reset is the operator’s primary task.

Figure 1.

Screenshot of the VR environment in ViRTRAS. Left: Side view of the capsule at 70 m above the ground. Right: Frontal side view of the capsule at 0.5 m above the ground. a) The transparent transmitter masts containing the information clouds. b) The capsule. c) The extended ramp.

Figure 2.

Screenshots of the VR environment in ViRTRAS. a) View from inside the capsule towards the extended ramp and the frozen information cloud. b) Control panel inside the capsule used for diagnosis of the atmospheric condition. c) Panel for mixing and applying the protective alloy.

What is special about this planet are the frequently changing atmospheric conditions. There are a total of seven possible distinct atmospheric conditions of which six can be harmful to the material of the ramp the participants must walk over. Therefore, it is necessary to diagnose the current atmospheric condition to possibly apply one of six different alloys that protects the ramp. If protection of the ramp is necessary and the participant does not employ the appropriate alloy, the ramp will be damaged and crack. If the ramp cracks while the participant is still on it, they will virtually fall for 1.5 seconds before the environment fades out and turns white. Consequently, they will not experience virtually hitting the ground from higher altitudes.

Primary Task

The primary task can be divided into three phases: 1) Diagnosing the current atmospheric condition, 2) Deciding on the appropriate protective alloy for the ramp and applying it (if necessary), and 3) Stepping outside of the capsule and manually resetting the information cloud before returning to the capsule. These are detailed in the following.

The atmospheric condition is determined by the state of four different fictious parameters as follows: cosmic radiation, temperature, atmospheric electricity, and pH-value. For each of these parameters there is a specific test procedure. In six out of the seven conditions, one or two parameters are out of normal range, indicating a harmful environment the ramp needs to be protected from. The state of the parameters can be viewed and accessed via a control panel inside the capsule (Figure 2b). The participants can operate this panel with the controllers in their hands. In the manual condition without automation support, participants must monitor two to four parameters (depending on the order they decide to use) to conclusively determine the outside atmospheric condition. With automation support this monitoring is facilitated. Automation support can, for example, narrow down the number of possible diagnoses or even provide a final diagnosis to participants. Secondly, once participants diagnosed the current atmospheric condition they must decide if a protective alloy for the ramp is necessary. If so, participants then have to determine the appropriate liquids to be mixed and subsequently mix them (via button presses; Figure 2c). The resulting alloy can be applied to the ramp by further button presses. For conditions with automation support this can be allocated to the automation as well. In case all parameters are within the expected range, the step of mixing and applying the alloy can be skipped.

Thirdly, participants have to fulfill the primary task of resetting the information cloud outside of the capsule. For this phase, they press a button to extend the ramp towards the frozen information cloud. Once they walked across the ramp, they place their right-hand controller into the information cloud and pull the trigger of the controller. Subsequently, the information cloud changes its color and starts moving again inside of the transmitter mast. The task is finished when participants have returned to the capsule and the automatic door is closed. This constitutes the completion of one trial.

Participants’ physical location is tracked by the VR-setup. To reach a destination within the environment participants physically walk within the laboratory and their re-positioning is reflected in the virtual display. This congruency between their real body movement and the visual feedback strongly enhances participants’ feeling of immersion and presence (Sanchez-Vives & Slater, 2005).

Secondary Task

With respect to the results of the requirement analysis, there is a continuous auditory reaction time task parallel to the primary task—the connection check. At the beginning of the experiment participants are assigned an operator number. During the experiment they hear a voice announcing an operator number asking for feedback. If the voice announces the participants’ assigned number, they have to pull the trigger of their left-hand controller as quickly as possible to indicate a proper connection. If the voice announces a different number, participants are not supposed to react to the probe. Feedback on correct responses, misses, and false reactions to unassigned operator numbers is indicated with different auditory signals. The difficulty of the task can be manipulated by changing the frequency of the communications.

(Possible) Automation Support of the Primary Task

At the current stage of development different DOAs have been programmed to support the primary task and can be selected when planning an experiment. Each of these corresponds to different stages and levels of automation by Parasuraman et al. (2000). For example, an automation corresponding to stage two (information analysis) narrows down the possible atmospheric anomalous conditions from six to three. Thus, the number of possible conditions is reduced but operators still must assess parameters for a conclusive diagnosis. Alternatively, the testbed can be configured by the experimenter to enable automation support of stage four (action implementation), which presents the current atmospheric condition and performs the mixing and application of the protective alloy automatically if the operator does not veto it within a set time. There are several gradations between these two examples programmed that the experimenter can choose from. Reliability can be adjusted by scripting automation’s mistakes and/or breakdowns.

Metrics

Several dependent variables can already be recorded with many conceivable additions possible. There are two different metrics quantifying the performance in the primary task (correct diagnosis and decision time) as well as in the secondary task (correct responses and response time). Moreover, information sampling as well as automation verification behavior can be quantified by measuring the number of parameters participants access and how much time they invest to do so. Moreover, subjective ratings and questionnaires can easily be added to the paradigm and set to be triggered at different moments within each trial (e.g., when the participant returns to the capsule after task completion). Items can be answered using the hand-held controllers.

Situational Risk

Risk is varied by the altitude the information cloud freezes, determining where the primary task must be carried out. There is the possibility of the ramp cracking if the participant made a mistake when diagnosing the atmospheric condition or choosing the protective alloy. Thus, regarding the definition of risk, the severity of a negative event is manipulated—in this case falling. The severity of falling from half a meter, for example, (usually) has a much lower negative consequence than falling from 70 m. The other aspect of the risk definition—the probability of the negative event—cannot be manipulated experimentally but can be affected by participants due to their performance at diagnosing the situation and implementing the appropriate action. Note that the situational risk is implemented dichotomously (low vs. high), as it is not assumed to increase linearly with the altitude of the setting. The severity of falling from 70 m in real life compared to only 50 m is not considerably different.

In summary, the goal of ViRTRAS is to provide an experimental environment for automation research that allows the immersive illusion of risk to be varied while at the same time not creating a harmful situation for participants. To demonstrate that this operationalization of a virtual situational risk is effective, two validation studies were conducted.

STUDY I

In ViRTRAS, situational risk is manipulated by the altitude the primary task must be carried out at – namely crossing the ramp to the information cloud. The goal of Study I was to validate that walking across the ramp at 70 m above ground induces a more pronounced perception of risk compared to only 0.5 m above ground. To demonstrate this, multiple subjective ratings as well as objective behavioral metrics were employed as indicators for perceived situational risk. The study was registered at Open Science Framework (doi: 10.17605/OSF.IO/YWEZX). Results of this first study have previously been presented at the HFES 65th International Annual Meeting (Hösterey & Onnasch, 2021).

Method

Participants

38 individuals participated in the study. Two participants had to be excluded: One prior to the experiment because they met the criterion for acrophobia determined with the visual Height Intolerance Severity Scale (vHISS; Huppert et al., 2017). The vHISS was administered to all potential participants prior to the experiment as acrophobia was an exclusion criterion. Another participant was excluded because of a technical error of the head-mounted display. Thus, data of 36 participants (23 female, 13 male, 0 non-binary) ranging in age from 18 to 72 (M = 29.3, SD = 13.1) were analyzed. Fifteen of them indicated that they had used a head-mounted VR display before of which two did so for longer than 3 hours in total. A prior power analysis (G*Power; Faul et al., 2009) resulted in a planned sample size of 50 participants. However, an exponential rise in COVID-19 cases in Germany in December 2020 caused us to prematurely terminate data collection. Thus, data analysis was conducted with the available data up to that point. This research complied with the American Psychological Association Code of Ethics and was approved by the ethics committee of the department of psychology, Humboldt-Universität zu Berlin. Informed consent was obtained from each participant.

Task and Apparatus

A modified version of ViRTRAS was used for the experiment. It was rendered using Unreal Engine software (Version 4.24; Epic Games Inc., Cary, NC). Participants were exposed to the virtual environment with a head-mounted display (HTC Vive Pro, HTC, Taiwan, ROC).

The virtual scene was set in the same environment of the paradigm described above. However, participants’ only task was to step outside of the capsule and cross the ramp to ‘activate’ a cube at the end of the ramp (Figure 3). The cube was activated by touching the cube while pulling the trigger of their right-hand controller. This resulted in the controller vibrating and the cube turning green. This task was analogous to the third phase of the primary task of ViRTRAS. Once the participants returned to the capsule the door closed automatically. At the end of most trials, participants were presented some questions concerning the trial they had just experienced. Answers were provided using the controllers. Once this was finished, the next trial started when the door opened again.

Figure 3.

View from inside the capsule towards the unactivated cube in the modified version of ViRTRAS. a) Low-altitude condition (0.5 m above the ground). b) High-altitude condition (70 m above the ground).

Design

The study used a 2 (altitude) × 4 (trial) within-subject design. This resulted in a total of eight trials per participant. The order of the trials was semi-randomized, restricted so that each condition could not be experienced more than three times in a row.

Subjective Ratings

The questionnaire queries consisted of both pre-existing validated scales, assessing dimensions of anxiety and arousal as well as experimenter generated items that were designed to infer the participants’ perception of risk:

1. The Self-Assessment Manikin (SAM) on the dimensions ‘valence’ and ‘arousal’ was used. SAM is a non-verbal picture-based assessment technique with which participants can select images of a manikin representing different emotional states (9-point scale; Bradley & Lang, 1994). While the arousal dimension was of main interest, the valence dimension was assessed to confirm that a higher arousal was not accompanied by more positive emotions (higher scores represent stronger arousal and a more positive valence).

2. The sum score of the short German state version of the State-Trait-Anxiety Inventory with ten items on an 8-point Likert scale was used (STAI-S; Grimm, 2009). Exemplary items translated into English are: “I feel tense” and “I feel worried.”

3. The following experimenter generated items were used on a 6-point Likert scale. The items were presented in German and higher ratings mirrored stronger agreement: 1) “I found it difficult to step out of the capsule” (difficulty stepping out), 2) “I deliberately avoided falling off the extended ramp” (avoidance falling off), 3) “I would have dared to take a step off the extended ramp, causing me to fall off”, (dared to step off), 4) “I felt walking over the extended ramp was a risk” (feeling risk).

Objective Indicators

The following measures were calculated based on participants’ behavior. They were assessed for each separate trial.

1. Hesitation: The time it took participants to leave the capsule once the door opened. Longer times were expected to indicate more hesitant behavior.

2. Caution: The total time participants spent outside of the capsule from leaving to returning to the capsule. Longer times were assumed to signify more cautious behavior.

It was supposed that longer times in both measures are due to a more pronounced perception of situational risk.

Procedure

Participants were informed about the virtual exposure to extreme altitudes and the possibility of virtually falling in the invitation to the study and once again reminded when they arrived. However, in this study virtually falling—even when stepping off the ramp—was not implemented and therefore impossible. Participants did not know that. To minimize the chance of an unpleasant experience, they were reminded that ending the experiment prematurely did not bear negative consequences and they were encouraged to take off the head-mounted display whenever they felt uncomfortable. After giving their informed consent, they answered demographic queries on a tablet computer. Additionally, they completed the vHISS. When participants did not meet the criterion for a diagnosis of acrophobia, they were instructed to put on the head-mounted display and were handed a controller for each hand. Then, participants received a short tutorial inside the VR environment in which the task as well as the functionality of the controllers were explained. Additionally, they completed three training trials of the task in a completely white environment with very limited altitude cues. Afterwards, the view from the higher condition was presented for at least 15 seconds before they could resume to start the main experiment. This was implemented to give participants the opportunity to become familiar with the view, making it less likely they would do so during the experimental trials. At the start of each trial participants stood inside the capsule facing the initially closed sliding door, which then automatically opened simultaneous to the extension of the ramp (214 cm long and 110 cm wide). To further increase immersion, appropriate noises were presented via the headphones of the head-mounted display. When participants returned to the capsule, they answered the subjective ratings on risk perception (SAM, STAI-S, experimenter generated items) within the VR environment using their hand-held controllers. Each rating was answered once per altitude condition. To prevent an overload of questions in one trial, they were distributed across trials. One trial took participants ten to 40 seconds excluding the time for subjective ratings. After completion of all eight trials, participants removed the head-mounted display and answered an additional questionnaire on a tablet computer before the experiment ended. Completion of the entire experiment took approximately 30 minutes.

Data Analysis

The subjective ratings were analyzed using paired sample t-tests (one data point per condition) while the objective indicators were analyzed using a 2 (altitude) × 4 (trial) repeated measures ANOVA (four data points per condition). Prior to inferential statistical analysis the objective behavioral data were standardized by each participant’s mean and standard deviation (participant-wise standardization). Using this transformation each participant was their own anchor with the value zero being their individual mean. The value one would indicate that a measurement was one standard deviation above the individual’s mean. This is advantageous in two ways. First, for our purpose the intraindividual change in participants’ reaction times between the low- and high-altitude conditions were the most relevant. For this particular goal, participant-wise standardized values therefore had a higher information content because overall mean scores might have been skewed by exceptional time values of single individuals. Second, this transformation enabled us to better handle extreme values due to measurement errors. For example, if a participant paused to adjust their head-mounted display before stepping out of the capsule this would distinctly distort this trial’s reaction time. Replacing this value with a zero would be closer to the individual’s real value than for example using the sample’s mean. To make sure that only values were corrected that were due to a clear measurement error a very conservative criterion was chosen for replacement: Only time values of five or higher would have been replaced by the individual’s mean. However, this was not necessary for this study as no values reached such magnitude.

Results

Subjective Ratings

Means, standard deviations and t-test results are presented in Table 1. Participants felt more anxiety and a higher arousal in the high-altitude condition. This was not accompanied by a more positive valence. They also reported experiencing a higher difficulty stepping out of the capsule and felt a stronger urge to avoid falling off the ramp. Additionally, in the high-altitude condition participants disagreed stronger with the statement that they would have dared to step off the ramp, which would have resulted in a fall. They also agreed more strongly that crossing the ramp felt like a risk compared to the low-altitude condition.

TABLE 1:

Mean (SD) of the Subjective Ratings in the Low- and High-Altitude Condition and the Results of the Paired Sample t-Tests With Cohen’s d as an Effect Size Estimate

	Low	High	t	p	d
SAM arousal	2.5 (1.5)	3.1 (1.5)	3.42	.001	.57
SAM valence	7.4 (1.1)	7.2 (1.2)	1.36	.09	.23
STAI-S sum score	20.3 (9.3)	30.4 (12.2)	6.52	<.001	1.09
Difficulty stepping out	1.1 (0.3)	1.7 (0.9)	3.80	<.001	.63
Avoidance falling off	3.0 (2.0)	3.9 (1.8)	4.74	<.001	.79
Dared to step off	4.5 (1.6)	3.5 (1.8)	3.83	<.001	.64
Feeling risk	1.1 (0.4)	2.3 (1.2)	6.04	<.001	1.01

Objective Indicators

Results revealed that participants hesitated longer to take their first step out of the capsule in the high-altitude condition compared to the low-altitude condition (Hesitation; Figure 4a). This was supported by a significant main effect of altitude, F (1, 35) = 5.90, p = .020, η² = .14. Additionally, the main effect of trial was significant, F (3, 105) = 3.53, p = .017, η² = .09. This indicates that participants were faster in stepping out of the capsule in later trials compared to earlier ones. There was no significant interaction effect (F < 1, p = .98).

Figure 4.

Both objective indicators for each trial and both altitude conditions of Study I. a) Participant-wise standardized values for Hesitation (time participants took to leave the capsule after the door opened). b) Participant-wise standardized values for Caution (time participants spent outside of the capsule). Error bars depict the 95% confidence interval.

Participants took longer to fulfill the main task and return to the capsule in the high-altitude condition compared to the low-altitude condition (Caution; Figure 4b). This was supported by a significant main effect of altitude, F (1, 35) = 28.53, p < .001, η² = .45. The main effect of trial was also significant, F (2.44, 85.50) = 11.41, p < .001, η² = .25. This indicates that participants increased their speed in the course of the experiment. There was no significant interaction effect for Caution (F = 1.61, p = .19).

Discussion Study I

The study successfully demonstrated that the manipulation of risk through the variation of altitude has an impact on the individuals’ perception of risk. This was illustrated by significant mean differences in subjective self-reports as well as objective behavioral data with medium to large effect sizes (Cohen, 1988). These are more thoroughly addressed in the overall discussion below.

Participants hesitated significantly longer to take the first step out of the capsule and seemed to act more cautious crossing the ramp in the high-altitude condition. However, they significantly increased their speed in these measures in both altitude conditions during the experiment. This could either be interpreted as a familiarization with and improvement of the task or as participants’ becoming accustomed to the altitude exposure or a combination of both. The lacking interaction effect between altitude and trial implied that participants became accustomed to the overall situation equally in both conditions. However, with only four trials per condition a profound analysis of habituation was not possible. We therefore decided to conduct an additional study with considerably more trials to explore the progress of habituation in greater depth.

STUDY II

The second study investigated habituation to risk perception more closely in the objective behavioral measures. The main change in Study II was the increase of the number of times participants had to cross the ramp in each condition from four to 30. That way, two open questions concerning the behavioral measures could be addressed: First, does the main effect of altitude remain when participants conduct the task much more often? Second, does the habituation indeed proceed similarly in both risk conditions so that no interaction between altitude and trial can be observed? The study was registered at Open Science Framework (doi: 10.17605/OSF.IO/ZEWGY) and was approved by the ethics committee of the department of psychology, Humboldt-Universität zu Berlin.

Method

Participants

Twelve individuals participated in the study (8 female, 4 male, 0 non-binary). A prior power analysis (G*Power; Faul et al., 2009) revealed that this was sufficient to detect an interaction effect of medium effect size. Participants’ age ranged from 19 to 56 (M = 27.5, SD = 9.4). No participant was excluded from the experiment or data analysis because of an acrophobia diagnosis (vHISS; Huppert et al., 2017) or any other reason. Five participants indicated that they had used a head-mounted VR display before of which one did so for longer than 3 hours in total. No participant of Study II had already participated in Study I.

Task and Apparatus

The same setup, virtual environment, and main task of Study I was used in the second study. Participants again had to “activate” the cube at the end of the ramp which was analogous to the third phase of the ViRTRAS primary task. However, we added two new minor tasks that participants had to perform inside the capsule in each trial. The first resembled a common CAPTCHA task. On a screen inside the capsule, similar to ViRTRAS' control panel, participants were presented with nine squares filled with blue rectangles at random positions and in different sizes. Participants were asked to use their hand-held controller to select the three squares which additionally depicted a red triangle. After confirming their selection, they were presented with the prompt to mix two specific-colored liquids at the mixing panel on the opposite side of the capsule. After finishing the liquid mix, participants could manually open the capsule’s door to cross the ramp to “activate” the cube in the same way as in Study I. The new tasks were added to incorporate similar panels used in the paradigm to increase the tasks’ semblance to the full version of ViRTRAS. Additionally, it extended the time between two trials giving participants the opportunity to have a short pause before stepping out of the capsule again.

Design

The study used a 2 (altitude) × 30 (trial) within-subject design, which resulted in a total of 60 trials per participant. The trials were divided into three sets of 20 trials with ten trials in each condition (high and low altitude). The order of trials within the three sets was semi-randomized, restricted so that each condition could not be experienced more than three times in a row.

Objective Indicators

The temporal measures Hesitation and Caution used in Study I were again deployed and measured in each trial.

Subjective Rating

The German version of the STAI-S (Grimm, 2009) already used in Study I was assessed only after the last trial in each condition. Additionally, after answering the questionnaire participants were asked in what altitude condition they conducted their task in (binary choice). This was done to verify that participants’ answers applied to the respective altitude condition.

Procedure

Study II’s procedure differentiated from Study I only in the following point: After 20 as well as 40 trials, the experiment was shortly paused to give participants the opportunity to decide if they wanted to take a break or continue immediately. Completion of the experiment took approximately 45 minutes.

Data Analysis

The objective indicators were analyzed using a 2 (altitude) × 30 (trial) repeated measures ANOVA. Prior to analysis values of the objective indicators were participant-wise standardized (see data analysis of Study I). Replacement of extreme time values due to a measurement error had to be done for both measures twice (once in each altitude condition respectively). The sum score of the STAI-S was analyzed using a paired sample t-test. As this study’s focus was on the objective indicators, the decision on the sample size was based on these measures resulting in the subjective ratings being underpowered for inferential statistical analysis.

Results

Objective Indicators

Participants hesitated longer in the high-altitude condition to take their first step out of the capsule, which was supported by a significant main effect of altitude, F (1, 11) = 8.35, p < .05, η² = .43. Participants became faster in taking their first step out of the capsule, which was supported by a significant main effect of trial, F (29, 319) = 14.33, p < .001, η² = .57. There was no interaction between altitude and trial for Hesitation, F (29, 319) = 1.03, p = .42 (Figure 5a).

Figure 5.

Both objective indicators for each trial and both altitude conditions of Study II. a) Participant-wise standardized values for Hesitation (time participants took to leave the capsule after the door opened). b) Participant-wise standardized values for Caution (time participants spent outside of the capsule). Error bars depict the 95% confidence interval.

In the high-altitude condition, participants took longer to cross the ramp, which was supported by a main effect of altitude for Caution, F (1, 11) = 5.16, p < .05, η² = .32. Participants became faster in conducting the task, which was supported by a main effect of trial, F (29, 319) = 7.87, p < .001, η² = .42. There was no interaction between altitude and trial for Caution, F (29, 319) = 1.01, p = .46 (Figure 5b).

Subjective Rating

Data from participants who could not correctly identify the condition that applied to each post trial probe were excluded from the analysis. This was the case for four participants leaving eight participants for this analysis. Sum scores of the STAI-S (low: M = 16.63, SD = 8.45; high: M = 22.00, SD = 11.39) significantly differed after the last trial of each altitude condition, t = 2.10, p < .05, d = .73.

Discussion Study II

The second study demonstrated that the altitude manipulation altered the perception of risk in the behavioral measures even when exposed to the altitude for much longer. Throughout the experiment participants acted more hesitant and cautious in the high-altitude condition with medium to large effect sizes according to Cohen (1988). Habituation occurred in both measures during the experiment. There was no interaction between both factors implying that habituation proceeded in both conditions similarly. In other words, even after a substantial number of trials participants seemed to perceive the high-altitude condition as riskier which resulted in objectively measurable hesitant and cautious behavior. This was further underlined by the significant difference in subjective risk perception assessed by the questionnaire measuring participants’ state anxiety after the last trial in each condition.

OVERALL DISCUSSION

The present studies were conducted to demonstrate that the new experimental paradigm’s (ViRTRAS) manipulation of risk through the variation of altitude has an impact on individuals’ perception of risk. This was successfully illustrated in the first study by significant mean differences in validated scales as well as experimenter generated items relevant to the specific task and paradigm. The effect sizes varied from medium to large. However, even though participants agreed more strongly to the statement that crossing the ramp felt like a literal risk in the higher altitude, the absolute ratings were rather low. This is desirable as inducing an even higher perception of risk is not the intention of the paradigm. It is vital to ensure that the line between an ethically sound operationalization of a simulated experienceable risk to oneself and provoking a real and pronounced fear for the participants must not be crossed. Moreover, it is reasonable that when participants answered this query, they were well aware of the fact that their physical well-being was never in danger despite the immersive simulation. To rather disagree having felt an actual risk even in the high-altitude condition is therefore consistent with the rest of the data. This was even further accentuated by the large effect sizes of the mean differences between both altitude conditions. In sum, results of the subjective indicators demonstrated a successful and ethical risk operationalization in ViRTRAS with considerable differences in self-reported risk perception.

Alongside the evidence from self-reports there was substantial additional support for the effectiveness of the risk manipulation in the behavioral data. In both studies in the high-altitude condition participants hesitated longer to take a step out of the capsule and took more time to finish the task outside implying more caution. Again, results revealed strong effect sizes. It is unlikely that these time differences originated from the fact that participants stopped to enjoy the view in the high-altitude condition, which would have implied casualness. Before the experimental trials, participants were given as much time as they liked to enjoy the view in the high-altitude condition. Additionally, they were explicitly instructed to perform the task as fast as possible—but to take the time they needed. Arguably it is more likely that longer times at the higher altitude were a result of behavioral reactions such as the stiffening of the musculoskeletal apparatus and an increase in body sway reported by Wuehr et al. (2019) due to the perception of the situation as risky. Hence, in combination with the results of the subjective ratings, it seems justifiable to interpret longer times in these metrics to be an indication of more cautious and hesitant behavior.

It is notable that participants’ time values for Caution in Study I were particularly high in the third trial in the high-altitude condition (Figure 4B). In search of a reason for this data pattern, various possible explanations could be ruled out: There were no exceptional events in the lab that might have caused the result pattern. Moreover, due to the semi-randomization of the trials’ order the third trial could have been at very different points in time for each individual (any trial between the third and seventh). Furthermore, the experimenter was required to always note in the protocol if participants stopped or slowed down, for example, because they adjusted their head-mounted display. However, this was not the cause here. Also, this pattern of the third trial was not replicated in the second study although both experimental designs were very similar. We therefore assume that this peculiarity in the third trial of the first study was a mere coincidence.

Additional to the main effect of altitude, the behavioral results revealed a significant main effect of trial: In both measures participants increased their speed during the experiment. The lack of a significant interaction of these factors in the first study suggested that the increase of speed was not considerably different in both altitude conditions. To further investigate this, the second study was conducted with a much larger number of trials. Besides the clear main effect of altitude, results revealed no interaction between the factors altitude and trial implying that habituation to this task proceeded similarly in both altitude conditions. This interpretation was further backed by the significant results of the anxiety questionnaire (STAI-S) assessed after the last trial in each altitude. Nonetheless, the clear decrease in both objective measures throughout the experiment likely implies that participants continuously became more accustomed to the altitude exposure. However, arguably, this does not reduce the external validity of this paradigm. It is well known that a person’s evaluation of a risk is attenuated with the duration of the exposure to it (Rohrmann & Renn, 2000). When using ViRTRAS, it is therefore important to control whether individuals have participated in an experiment with this paradigm before.

A limitation of manipulating risk via altitude is that the severity of a fall in real life does not increase with the altitude linearly. At very low altitudes, it increases very quickly. However, at altitudes in which a fall in real life would certainly be fatal, the severity reaches a plateau. In contrast, the consequence of falling from lower altitudes can vary tremendously from none to severe injury and death. A small-stepped classification of risk levels is therefore difficult. To increase the granularity of risk levels, future studies could aim to vary the second aspect of the risk definition as a further development of this paradigm: the probability of a negative event. This could be accomplished by adjusting the occurrence probability of a harmful atmospheric condition for the ramp and disclosing that to the participants. Furthermore, it is conceivable to expand risk levels by manipulating the visibility of the ramp with fog or smoke, thus, increasing the perceived probability of a misstep.

A further limitation is that the studies’ focus of investigation was on the risk factor only, which was the innovative part of the new paradigm. Although very close attention was paid to ensure that ViRTRAS meets the same requirements as other paradigms in human-automation research, the experimental setup of these validation studies only manipulated a subset of these features, namely the risk factor and did not implement automation. To further validate that ViRTRAS is indeed an appropriate paradigm for automation research that incorporates situational risk, future studies should systematically vary the DOA and provide data patterns comparable to previous research (e.g., reduced monitoring and workload with higher DOAs). This should be particularly valid for the low-altitude condition as this most likely resembles experimental setups that have not incorporated risk.

In summary, the manipulation checks of risk perception were successful, and the results are in line with the increase of subjective fear, simultaneous to an increase of altitude reported by Wuehr et al. (2019). The present studies therefore confirm that the execution of a task in a VR environment in higher altitudes has considerable effects on various subjective as well as objective variables indicating a higher perception of situational risk even for up to at least 30 trials.

CONCLUSIONS

When real-life operators interact with automation support to search for and integrate multiple pieces of information to come to a decision and chose the appropriate subsequent actions, mistakes usually have negative consequences (e.g., a radiologist deciding whether tissue is cancerous or not; a pilot adjusting the flight altitude to avoid collision with another airplane). While the simulations used in laboratory experiments successfully reproduce the cognitive demands of the real-life work context (e.g., Manzey et al., 2008; Santiago-Espada et al., 2011), usually participants do not face any negative consequences associated with bad performance (Chancey, 2016). ViRTRAS addresses this gap between real-world and laboratory setting by introducing a subjectively experienceable situational risk into human-automation research that can be manipulated as part of the environment.

Established effects, which were found under no or only limited risk can now be revisited, such as effects of human-automation function allocation. Possibly, the finding that operators tend to invest fewer resources for automation verification when being supported by higher DOAs (Mosier & Manzey, 2019) has been overestimated in experiments in which participants do not have to face negative consequences. Using ViRTRAS, one can investigate whether participants show more appropriate automation verification when a situational risk is present. In addition, disregarding risk in the laboratory might also lead to the underestimation of effects. It is conceivable that the experience of an automation failure might have even longer lasting effects on participants’ behavior and the evaluation of the support (Muir & Moray, 1996) when the situational risk is high compared to low. In conclusion, by introducing a directly experienceable risk into the laboratory of human-automation research, ViRTRAS constitutes a new tool for human-automation research which has the potential to increase the transferability of findings from laboratory-based experiments to the real-world work context.

The new paradigm manages to add this central contextual factor of real-life working situations, which previously has been either neglected or operationalized in an arguably rather incomparable way. In contrast to prior attempts, the situational risk is not just hypothetical (e.g., Lyons and Stokes, 2012), manipulated by the mere prospect of a decreased monetary reward (e.g., Wiczorek & Manzey, 2014), or the delay of the completion of an experiment (e.g., Chancey, 2016). It rather creates the illusion of a situation where a mistake has severe simulated consequences to the integrity of the participants’ body. In this fashion, ViRTRAS incorporates the manipulation of the situational risk more comparable to the situation of real-life operators in a way that is safe and ethical.

TOPIC CHOICE

Simulation and Virtual Reality

Footnotes

Acknowledgments

Special thanks to Markus Ahrendt for programming the virtual environment.

KEY POINTS

In a newly developed multi-task paradigm (Virtual Reality Testbed for Risk and Automation Studies; ViRTRAS) risk can be varied as a directly experienceable part of the virtual environment

The operationalization via altitude was successful as it induced different perceptions of risk in subjective and objective measures

Participants habituated to the situation over time similarly in both altitude conditions

ViRTRAS constitutes a new tool for human-automation research which has the potential to distinctly increase the transferability of laboratory-based experiments to the real-world work context

ORCID iD

Steffen Hoesterey

Author biographies

Steffen Hoesterey is a research associate at the Institute of Psychology of the Humboldt-Universität zu Berlin, Germany, where he earned his M.Sc. in psychology in 2019. He is currently working on a PhD addressing contextual factors such as risk in human-automation interaction.

Linda Onnasch is professor of engineering psychology at the Humboldt-Universität zu Berlin, Germany. She earned her PhD in 2014 at the Technical University of Berlin on effects of reliability and function allocation in human-automation interaction.

References

Ahir

Govani

Gajera

Shah

(2019). Application on virtual reality for enhanced education learning, military training and sports. Augmented Human Research, 5(1), 7. https://doi.org/10.1007/s41133-019-0025-2

Bradley

M. M.

Lang

P. J.

(1994). Measuring emotion: The self-assessment manikin and the semantic differential. Journal of Behavior Therapy and Experimental Psychiatry, 25(1), 49–59. https://doi.org/10.1016/0005-7916(94)90063-9

Chancey

E. T.

(2016). The effects of alarm system errors on dependence: Moderated mediation of trust with and without risk [Doctoral dissertation, Old Dominion University]. Old Dominion University Library. https://doi.org/10.25777/34GR-X929

Coelho

C. M.

Waters

A. M.

Hine

T. J.

Wallis

(2009). The use of virtual reality in acrophobia research and treatment. Journal of Anxiety Disorders, 23(5), 563–574. https://doi.org/10.1016/j.janxdis.2009.01.014

Cohen

(1988). Statistical power analysis for the behavioral sciences. Academic, 54.

Faul

Erdfelder

Buchner

Lang

A.-G.

(2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41(4), 1149–1160. https://doi.org/10.3758/BRM.41.4.1149

German Institute for Standardization . (2010). Safety of machinery - General principles for design - Risk assessment and risk reduction (EN ISO Standard No. 12100:2010). German Institute for Standardization.

Grimm

(2009). State-Trait-Anxiety Inventory nach Spielberger. Deutsche Lang- und Kurzversion. Methodenforum der Universität Wien. MF-Working Paper 2009/02.

Hoff

K. A.

Bashir

(2015). Trust in automation: Integrating empirical evidence on factors that influence trust. Human Factors, 57, 407–434. https://doi.org/10.1177/0018720814547570

10.

Hösterey

Onnasch

(2021). Manipulating situational risk in human-automation research – Validation of a new experimental paradigm in virtual reality. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 65, 1109–1113. https://doi.org/10.1177/1071181321651161

11.

Huppert

Grill

Brandt

(2017). A new questionnaire for estimating the severity of visual height intolerance and acrophobia by a metric interval scale. Frontiers in Neurology, 8, 211. https://doi.org/10.3389/fneur.2017.00211

12.

Jamieson

G. A.

Skraaning

(2019). The absence of degree of automation trade-offs in complex work settings. Human Factors, 62, 516–529. https://doi.org/10.1177/0018720819842709

13.

Jamieson

G. A.

Skraaning

(2020). The harder they fall? A response to Wickens et al. (2019) regarding the generalizability of lumberjack predictions to complex work settings. Human Factors, 62, 535–539. https://doi.org/10.1177/0018720820904623

14.

Lyons

J. B.

Stokes

C. K.

(2012). Human–human reliance in the context of automation. Human Factors, 54, 112–121. https://doi.org/10.1177/0018720811427034

15.

Manzey

Bleil

Bahner-Heyne

J. E.

Klostermann

Onnasch

Reichenbach

Röttger

(2008). AutoCAMS 2.0. Manual. http://www.aio.tu-berlin.de/?id=30492

16.

Mayer

R. C.

Davis

J. H.

Schoorman

F. D.

(1995). An integrative model of organizational trust. Academy of Management Review, 20(3), 709–734. https://doi.org/10.5465/amr.1995.9508080335

17.

Mosier

K. L.

Manzey

(2019). Humans and automated decision aids: A match made in heaven? In Mouloua

Hancock

P. A.

Ferraro

(Eds.), Human performance in automated and autonomous systems (1st ed., pp. 19–42). CRC Press. https://doi.org/10.1201/9780429458330-2

18.

Muir

B. M.

Moray

(1996). Trust in automation. Part II. Experimental studies of trust and human intervention in a process control simulation. Ergonomics, 39, 429–460. https://doi.org/10.1080/00140139608964474

19.

O’Neill

McNeese

Barron

Schelble

(2020). Human–autonomy teaming: A review and analysis of the empirical literature. Human Factors. https://doi.org/10.1177/0018720820960865

20.

Onnasch

Wickens

C. D.

Manzey

(2014). Human performance consequences of stages and levels of automation: An integrated meta-analysis. Human Factors, 56, 476–488. https://doi.org/10.1177/0018720813501549

21.

Pallavicini

Argenton

Toniazzi

Aceti

Mantovani

(2016). Virtual reality applications for stress management training in the military. Aerospace Medicine and Human Performance, 87(12), 1021–1030. https://doi.org/10.3357/AMHP.4596.2016

22.

Parasuraman

Manzey

D. H.

(2010). Complacency and bias in human use of automation: An attentional integration. Human Factors, 52, 381–410. https://doi.org/10.1177/0018720810376055

23.

Parasuraman

Riley

(1997). Humans and automation: use, misuse, disuse, abuse. Human Factors, 39, 230–253. https://doi.org/10.1518/001872097778543886

24.

Parasuraman

Sheridan

T. B.

Wickens

C. D.

(2000). A model for types and levels of human interaction with automation. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 30(3), 286–297. https://doi.org/10.1109/3468.844354

25.

Reising

D. V. C.

Sanderson

P. M.

(2002). Work domain analysis and sensors II: Pasteurizer II case study. International Journal of Human-Computer Studies, 56(6), 597–637. https://doi.org/10.1006/ijhc.2002.1005

26.

Rohrmann

Renn

(2000). Risk perception research: An introduction. In Renn

Rohrmann

(Eds.), Cross-Cultural Risk Perception (pp. 11–53). Springer US. https://doi.org/10.1007/978-1-4757-4891-8_1

27.

Sanchez-Vives

M. V.

Slater

(2005). From presence to consciousness through virtual reality. Nature Reviews Neuroscience, 6(4), 332–339 . https://doi.org/10.1038/nrn1651

28.

Santiago-Espada

Myer

R. R.

Latorella

K. A.

Comstock

J. R.

(2011). The Multi-Attribute Task Battery II (MATB-II) software for human performance and workload research: A user’s guide (NASA/TM-2011-217164). National Aeronautics and Space Administration, Langley Research Center.

29.

Stuck

R. E.

Holthausen

B. E.

Walker

B. N.

(2021). The role of risk in human-robot trust. In Nam

C. S.

Lyons

J. B.

(Eds.), Trust in Human-Robot Interaction (pp. 179–194). Academic Press. https://doi.org/10.1016/B978-0-12-819472-0.00008-3

30.

Wickens

C. D.

Onnasch

Sebok

Manzey

(2020). Absence of DOA effect but no proper test of the lumberjack effect: A reply to Jamieson and Skraaning (2019). Human Factors, 62, 530–534. https://doi.org/10.1177/0018720820901957

31.

Wiczorek

Manzey

(2014). Supporting attention allocation in multitask environments: Effects of likelihood alarm systems on trust, behavior, and performance. Human Factors, 56, 1209–1221. https://doi.org/10.1177/0018720814528534

32.

Winslow

B. D.

Carroll

M. B.

Martin

J. W.

Surpris

Chadderdon

G. L.

(2015). Identification of resilient individuals and those at risk for performance deficits under stress. Frontiers in Neuroscience, 9, 328, 328. https://doi.org/10.3389/fnins.2015.00328

33.

Wuehr

Breitkopf

Decker

Ibarra

Huppert

Brandt

(2019). Fear of heights in virtual reality saturates 20 to 40 m above ground. Journal of Neurology, 266(1), 80–87. https://doi.org/10.1007/s00415-019-09370-5

A New Experimental Paradigm to Manipulate Risk in Human-Automation Research

Abstract

Objective

Background

Method

Results

Conclusion

Application

Keywords

INTRODUCTION

VIRTUAL REALITY TESTBED FOR RISK AND AUTOMATION STUDIES

Requirement Analysis

Implementation

Cover Story

Primary Task

Secondary Task

(Possible) Automation Support of the Primary Task

Metrics

Situational Risk

STUDY I

Method

Participants

Task and Apparatus

Design

Subjective Ratings

Objective Indicators

Procedure

Data Analysis

Results

Subjective Ratings

Objective Indicators

Discussion Study I

STUDY II

Method

Participants

Task and Apparatus

Design

Objective Indicators

Subjective Rating

Procedure

Data Analysis

Results

Objective Indicators

Subjective Rating

Discussion Study II

OVERALL DISCUSSION

CONCLUSIONS

TOPIC CHOICE

Footnotes

Acknowledgments

KEY POINTS

ORCID iD

Author biographies

References