Comprehensive Evaluation of Explanation Types in a Spaceflight-Relevant Human

Abstract

Objective

This study evaluates how explanation type in an explainable AI (XAI) human–autonomy teaming (HAT) task affects performance, workload, trust, situation awareness (SA), and preference in a dynamic, spaceflight-relevant simulator. Second, we introduce a holistic evaluation method for comparing XAI systems across multiple outcomes.

Background

XAI aims to improve understanding, calibrate trust, and enhance performance of an HAT, but the impact of explanation type in realistic, high-taskload HAT settings remains underexplored.

Method

Participants ( $N = 31$ ) completed 18 trials in a dual-task simulator requiring manual rover driving while supervising an autonomous exploration agent. Participants received various combinations of global, contrastive, and deductive explanations for AI-generated routes, with incentives tied to performance.

Results

Explanation type significantly affected manual performance ( $p = 0.0003$ ), autonomy performance ( $p < 0.0001$ ), team performance ( $p < 0.0001$ ), workload ( $p < 0.0001$ ), trust ( $p < 0.0001$ ), and preference ( $p = 0.001$ ), but not SA ( $p = 0.41$ ). Participants preferred global and contrastive explanations, performing better with their preferred explanation ( $p = 0.049$ ).

Conclusion

Explanation type influences performance and perception in demanding HAT contexts. A standardized, multi-metric evaluation framework is essential for understanding tradeoffs in XAI design.

Application

In HAT tasks like space exploration where users must quickly make decisions with an AI teammate, designers must consider the explanation method for XAI explanations. Our human-centered evaluation found a contrastive + global explanation combination was the best in our HAT task across a range of performance and preference metrics.

Keywords

autonomous agents testing and evaluation mental workload trust in automation artificial intelligence

Get full access to this article

View all access options for this article.

References

Abusitta

M. Q.

Fung

B. C. M.

(2024). Survey on explainable AI: Techniques, challenges and open issues. Expert Systems with Applications, 255, Article 124710. https://doi.org/10.1016/j.eswa.2024.124710

Ali

Abuhmed

El-Sappagh

Muhammad

Alonso-Moral

J. M.

Confalonieri

Guidotti

Del Ser

Díaz-Rodríguez

Herrera

(2023). Explainable Artificial Intelligence (XAI): What we know and what is left to attain trustworthy artificial intelligence. Information Fusion, 99, 101805. https://doi.org/10.1016/j.inffus.2023.101805

Baron

Latham

A. J.

Varga

(2025). Explainable AI and stakes in medicine: A user study. Artificial Intelligence, 340, Article 104282. https://doi.org/10.1016/j.artint.2025.104282

Brooke

(1996). SUS-a quick and dirty usability scale. Usability Evaluation in Industry, 189(194), 4–7. https://doi.org/10.1201/9781498710411-35

Buchner

S. L.

Kintz

J. R.

Zhang

J. Y.

Banerjee

N. T.

Clark

T. K.

Hayman

A. P.

(2025). Assessing physiological signal utility and sensor burden in estimating trust, situation awareness, and mental workload. Journal of Cognitive Engineering and Decision Making, 19(2), 154–173. https://doi.org/10.1177/15553434241310084

Bustamante

E. A.

Fallon

C. K.

Bliss

Bailey

W. R.

Anderson

B. L.

(2005). Pilots’ workload, situation awareness, and trust during weather events as a function of time pressure, role assignment, pilots’ rank, weather display, and weather system. International Journal of Applied Aviation Studies, 5(2), 347–367.

Campos

Shakhovska

(2025). Advancing XAI development: An Agile framework for human-centered and explainable AI. In Degen

Ntoa

(Eds.), Artificial intelligence in HCI. Springer Nature. https://doi.org/10.1007/978-3-031-93412-4_2

Cau

F. M.

Hanna

Spano

L. D.

Tintarev

(2023). Effects of AI and logic-style explanations on users’ decisions under different levels of uncertainty. ACM Transactions on Interactive Intelligent Systems, 13(4), 22, 1–42. https://doi.org/10.1145/3588320

Chromik

Schuessler

(2020). A taxonomy for human subject evaluation of black-box explanations in XAI. Proceedings of the IUI Workshop on Explainable Smart Systems and Algorithmic Transparency in Emerging Technologies, (ExSS-ATEC’20).

10.

Colley

Stampf

Fischer

Rukzio

(2023). Effects of 3D displays on mental workload, situation awareness, trust, and performance assessment in automated vehicles. Proceedings of the 22nd international conference on mobile and ubiquitous multimedia, Vienna, Austria, 3 December, 2023, pp. 134–144. https://doi.org/10.1145/3626705.3627786

11.

Cooke

Demir

Huang

(2020). A framework for human-autonomy team research. In Harris

W.-C.

(Eds.), Engineering psychology and cognitive ergonomics. Cognition and design. Springer International Publishing. https://doi.org/10.1007/978-3-030-49183-3_11

12.

Dawoud

Samek

Eisert

Lapuschkin

Bosse

(2023). Human-centered evaluation of XAI methods. In 2023 IEEE International Conference on Data Mining Workshops (ICDMW) (pp. 912–921). IEEE. https://doi.org/10.1109/ICDMW60847.2023.00122

13.

Demir

McNeese

N. J.

Cooke

N. J.

(2020). Understanding human-robot teams in light of all-human teams: Aspects of team interaction and shared cognition. International Journal of Human-Computer Studies, 140, Article 102436. https://doi.org/10.1016/j.ijhcs.2020.102436

14.

Doshi-Velez

Kim

(2017). Towards a rigorous science of interpretable machine learning. arXiv:1702.08608. arXiv. https://doi.org/10.48550/arXiv.1702.08608

15.

Doshi-Velez

Kim

(2018). Considerations for evaluation and generalization in interpretable machine learning. In Escalante

H. J.

Escalera

Guyon

, et al. (Eds.), Explainable and interpretable models in computer vision and machine learning. Springer International Publishing. https://doi.org/10.1007/978-3-319-98131-4_1

16.

Endsley

M. R.

(2021). Situation awareness. In Handbook of human factors and ergonomics. John Wiley & Sons, Ltd. https://doi.org/10.1002/9781119636113.ch17

17.

Endsley

M. R.

Selcon

S. J.

Hardiman

T. D.

Croft

D. G.

(1998). A comparative analysis of Sagat and Sart for evaluations of situation awareness. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 42(1), 82–86. https://doi.org/10.1177/154193129804200119

18.

Gambetti

Han

Shen

Soares

(2025). A survey on human-centered evaluation of explainable AI methods in clinical decision support systems. arXiv:2502.09849. arXiv. https://doi.org/10.48550/arXiv.2502.09849

19.

Gentile

Donmez

Jamieson

G. A.

(2023). Human performance consequences of normative and contrastive explanations: An experiment in machine learning for reliability maintenance. Artificial Intelligence, 321, Article 103945. https://doi.org/10.1016/j.artint.2023.103945

20.

Gentile

Donmez

Jamieson

G. A.

(2025). Human performance effects of combining counterfactual explanations with normative and contrastive explanations in supervised machine learning for automated decision assistance. International Journal of Human-Computer Studies, 196, Article 103434. https://doi.org/10.1016/j.ijhcs.2024.103434

21.

Goldman

C. V.

Bustin

Xing

McPhearson-White

Rogers

(2025). The value of real-time automated explanations in stochastic planning. Artificial Intelligence, 343, Article 104323. https://doi.org/10.1016/j.artint.2025.104323

22.

Gosling

S. D.

Rentfrow

P. J.

Swann

W. B.

(2003). A very brief measure of the big-five personality domains. Journal of Research in Personality, 37(6), 504–528. https://doi.org/10.1016/S0092-6566(03)00046-1

23.

Gunning

Aha

D. W.

(2019). DARPA’s explainable artificial intelligence program. AI Magazine, 40(2), 44–58. https://doi.org/10.1609/aimag.v40i2.2850

24.

Hart

S. G.

(2006). Nasa-Task Load Index (NASA-TLX); 20 years later. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 50(9), 904–908. https://doi.org/10.1177/154193120605000909

25.

Holliday

Wilson

Stumpf

(2016). User trust in intelligent systems: A journey over time. Proceedings of the 21st international conference on intelligent user interfaces, Sonoma, CA, USA, 7 March, 2016, pp. 164–168. https://doi.org/10.1145/2856767.2856811

26.

Jessup

S. A.

Schneider

T. R.

Alarcon

G. M.

Ryan

T. J.

Capiola

(2019). The measurement of the propensity to trust automation. In Chen

J. Y. C.

Fragomeni

(Eds.), Virtual, augmented and mixed reality. Applications and case studies. Springer International Publishing. https://doi.org/10.1007/978-3-030-21565-1_32

27.

Jian

J.-Y.

Bisantz

A. M.

Drury

C. G.

(2000). Foundations for an empirically determined scale of trust in automated system. International Journal of Cognitive Ergonomics, 4(1), 53–71. https://doi.org/10.1207/S15327566IJCE0401_04

28.

Kadir

Md A.

Mosavi

Sonntag

(2023). Evaluation metrics for XAI: A review, taxonomy, and practical applications. 2023 IEEE 27th international conference on Intelligent Engineering Systems (INES), Nairobi, Kenya, 26–28 July, 2023, pp. 000111–000124. https://doi.org/10.1109/INES59282.2023.10297629

29.

Kim

Henry

Sent

(2024). Human-centered evaluation of explainable AI applications: A systematic review. Frontiers in Artificial Intelligence, 7, Article 1456486. https://doi.org/10.3389/frai.2024.1456486

30.

Kong

Xing

Tsourdos

Wang

Guo

Perrusquia

Wikander

(2024). Explainable interface for human-autonomy teaming: A survey. arXiv:2405.02583. arXiv. https://arxiv.org/abs/2405.02583

31.

Kulesza

Stumpf

Burnett

Yang

Kwan

Wong

W.-K.

(2013). Too much, too little, or just right? Ways explanations impact end users’ mental models. In 2013 IEEE symposium on visual languages and human centric computing (pp. 3–10). IEEE. https://doi.org/10.1109/VLHCC.2013.6645235

32.

Leary

Qin

Kong

Clark

Allison

(2024). Validating rapid trust measurements in spaceflight-relevant human-autonomy teaming applications. IAF human spaceflight symposium, Milan, Italy, 14 October, 2024, pp. 583–587. https://doi.org/10.52202/078364-0066

33.

Luebbers

Tabrez

Ruvane

Hayes

(2023, July). 10). Autonomous justification for enabling explainable decision support in human-robot teaming. Robotics: Science and systems 2023. Robotics: Science and Systems, XIX. https://doi.org/10.15607/RSS.2023.XIX.002

34.

Merritt

S. M.

Ako-Brew

Bryant

W. J.

Staley

McKenna

Leone

Shirase

(2019). Automation-induced complacency potential: Development and validation of a new scale. Frontiers in Psychology, 10, 225. https://doi.org/10.3389/fpsyg.2019.00225

35.

Miller

(2019). Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267, 1–38. https://doi.org/10.1016/j.artint.2018.07.007

36.

Mueller

S. T.

Hoffman

R. R.

Clancey

Emrey

Klein

(2019). Explanation in Human-AI systems: A literature meta-review, synopsis of key ideas and publications, and bibliography for explainable AI. arXiv:1902.01876. arXiv. https://doi.org/10.48550/arXiv.1902.01876

37.

Nauta

Trienes

Pathak

Nguyen

Peters

Schmitt

Schlötterer

van Keulen

Seifert

(2023). From anecdotal evidence to quantitative evaluation methods: A systematic review on evaluating explainable AI. ACM Computing Surveys, 55(13), 1–42. https://doi.org/10.1145/3583558

38.

Nizar

A. M.

Miwa

Uchida

(2024). Examining situational awareness, trust in automation, and workload in engine resources management: An evaluation of head-worn display technology. WMU Journal of Maritime Affairs, 23(1), 115–128. https://doi.org/10.1007/s13437-023-00319-0

39.

Norenko

Edlund

A. F.

(2024). Evaluating explanation types and their impact on trust, mental workload and situational awareness in autonomous driving. Umea Universitet.

40.

O’Neill

McNeese

Barron

Schelble

(2022). Human–autonomy teaming: A review and analysis of the empirical literature. Human Factors, 64(5), 904–938. https://doi.org/10.1177/0018720820960865

41.

Parasuraman

Sheridan

T. B.

Wickens

C. D.

(2008). Situation awareness, mental workload, and trust in automation: Viable, empirically supported cognitive engineering constructs. Journal of Cognitive Engineering and Decision Making, 2(2), 140–160. https://doi.org/10.1518/155534308X284417

42.

Pawlicki

Pawlicka

Uccello

Szelest

D’Antonio

Kozik

Choraś

(2024). Evaluating the necessity of the multiple metrics for assessing explainable AI: A critical examination. Neurocomputing, 602, Article 128282. https://doi.org/10.1016/j.neucom.2024.128282

43.

Ribeiro

M. T.

Singh

Guestrin

(2016). “Why should i trust you?”: Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, USA, 13 August 2016, pp. 1135–1144. https://doi.org/10.1145/2939672.2939778

44.

Richardson

E. E.

Kintz

J. R.

Buchner

S. L.

Clark

T. K.

Hayman

A. P.

(2025). Operator-agnostic and real-time usable psychophysiological models of trust, workload, and situation awareness. Frontiers in Computer Science, 7, Article 1549399. https://doi.org/10.3389/fcomp.2025.1549399

45.

Rong

Leemann

Nguyen

T.-T.

Fiedler

Qian

Unhelkar

Seidel

Kasneci

(2024). Towards human-centered explainable AI: A survey of user studies for model explanations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(4), 2104–2122. https://doi.org/10.1109/TPAMI.2023.3331846

46.

Sanneman

Shah

J. A.

(2020). A situation awareness-based framework for design and evaluation of explainable AI. In Calvaresi

Najjar

Winikoff

Främling

(Eds.), Explainable, transparent autonomous agents and multi-agent systems. Springer International Publishing. https://doi.org/10.1007/978-3-030-51924-7_6

47.

Shneiderman

(2020). Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy (Pt. 495-504). International Journal of Human–Computer Interaction , 36(6). (world). https://doi.org/10.1080/10447318.2020.1741118

48.

Vilone

Longo

(2021). Notions of explainability and evaluation approaches for explainable artificial intelligence. Information Fusion, 76, 89–106. https://doi.org/10.1016/j.inffus.2021.05.009

49.

Wang

Yuan

Rau

P.-L. P.

(2024). Effects of explanation strategy and autonomy of explainable AI on Human–AI collaborative decision-making. International Journal of Social Robotics, 16(4), 791–810. https://doi.org/10.1007/s12369-024-01132-2

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

1.35 MB

0.00 MB

Comprehensive Evaluation of Explanation Types in a Spaceflight-Relevant Human–Autonomy Teaming Task

Abstract

Objective

Background

Method

Results

Conclusion

Application

Keywords

Get full access to this article

References

Supplementary Material