Sage Journals: Discover world-class research

Abstract

The study of human-artificial intelligence (AI) teaming (HAT) and Human Digital Twin (HDT) modeling currently faces significant challenges in accurately simulating and assessing the effectiveness of interactions between humans and AI systems. Current methods typically rely on limited real-world data or simplified simulated representations that do not capture the complexity and variability of human digital twin and AI agent behavior. VirTLab-Eval is a novel agentic framework of human digital twins (HDTs) and AI agents to model and evaluate HAT behaviors and interventions across operational scenarios. A comprehensive set of measures, including novel socio-cognitive-emotional and behavioral metrics automatically extracted from team communications and interactions, for evaluating HAT performance is integrated into visual analytics enabling the assessment of interventions that target the attributes of HDTs and agents. VirTLab-Eval measurements capture nuanced aspects of trust development, emotional alignment, and team cohesion that conventional performance metrics might miss, providing richer insights into the HAT dynamics. Example results from search and rescue missions indicate that AI teammate reliability has a significant effect on communication dynamics and assistance behaviors, while Human Digital Twin (HDT) personality traits shape trust development and team coordination. These insights directly inform the design of Human-AI Teaming (HAT) training programs aimed at optimizing the use of AI systems, mitigating both over-reliance and under-utilization, and ultimately enhancing mission effectiveness while reducing risk to personnel.

Keywords

human-AI teaming human digital twin agent based simulation agentic AI workflows human-AI teaming evaluation frameworks

Get full access to this article

View all access options for this article.

References

Abdurahman

Atari

Karimi-Malekabadi

Xue

M. J.

Trager

Park

P. S.

Golazizian

Omrani

Dehghani

(2024). Perils and opportunities in using large language models in psychological research. PNAS Nexus, 3(7), Article pgae245.

Andrews

R. W.

Lilly

J. M.

Srivastava

Feigh

K. M.

(2023). The role of shared mental models in human-AI teams: A theoretical review. Theoretical Issues in Ergonomics Science, 24(1), 129–175.

Bansal

Nushi

Kamar

Lasecki

W. S.

Weld

D. S.

Horvitz

(2019). Beyond accuracy: The role of mental models in human-AI team performance. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 7(1), 2–11.

Barricelli

B. R.

Fogli

(2024). Digital twins in human-computer interaction: A systematic review. International Journal of Human-Computer Interaction, 40(2), 79–97.

Bendell

Williams

Fiore

S. M.

Jentsch

(2024). Individual and team profiling to support theory of mind in artificial social intelligence. Scientific Reports, 14(1), Article 12635.

Chiou

E. K.

Lee

J. D.

(2023). Trusting automation: Designing for responsivity and resilience. Human Factors, 65(1), 137–165.

De Visser

E. J.

Peeters

M. M. M.

Jung

M. F.

Kohn

Shaw

T. H.

Pak

Neerincx

M. A

. (2020). Towards a theory of longitudinal trust calibration in human–robot teams. International Journal of Social Robotics, 12(2), 459–478.

Demir

McNeese

N. J.

Cooke

N. J.

Myers

(2021). Team communication behaviors of the human-automation teaming. Journal of Cognitive Engineering and Decision Making, 15(1), 33–49.

Duan

Zhou

Scalia

M. J.

Freeman

Gorman

Tolston

McNeese

N. J.

Funke

(2025). Understanding the processes of trust and distrust contagion in Human–AI Teams: A qualitative approach. Computers in Human Behavior, 165, Article 108560.

10.

Fan

Tariq

Saadiq Bhuiyan

Yankoski

M. G.

Ford

T. W.

(2024). Comp-husim: Persistent digital personality simulation platform [Conference session]. Adjunct proceedings of the 32nd ACM conference on user modeling, adaptation and personalization, Cagliari, Italy, pp. 98–101.

11.

Fischer

(2011). Interpersonal variation in understanding robots as social actors [Conference session]. Proceedings of the 6th international conference on human-robot interaction, New York, NY, USA, pp. 53–60.

12.

Hancock

P. A.

Billings

D. R.

Schaefer

K. E.

Chen

J. Y. C.

De Visser

E. J.

Parasuraman

(2021). Trust in artificial intelligence: Meta-analytic findings. Human Factors, 65(2), 337–359.

13.

Huang

Freeman

Cooke

Colonna-Romano

Wood

Buchanan

Caufman

(2022). Artificial Social Intelligence for Successful Teams (ASIST) Study 3. https://dataverse.asu.edu/dataset.xhtml;jsessionid=a98d5380b03388db69f17cc4114a?persistentId=doi%3A10.48349%2FASU%2FQDQ4MH&version=&q=&fileTypeGroupFacet=%22Data%22&fileAccess=

14.

Jessup

S. A.

Schneider

T. R.

Alarcon

G. M.

Ryan

T. J.

Capiola

(2019). The measurement of the propensity to trust automation. In Chen

J. Y. C.

Fragomeni

(Eds.), Virtual, augmented and mixed reality. Applications and case studies (pp. 476–489). Springer International Publishing.

15.

McNeese

N. J.

Demir

Cooke

N. J.

Myers

(2018). Teaming with a synthetic teammate: Insights into human-autonomy teaming. Human Factors, 60(2), 262–273.

16.

Mercado

J. E.

Rupp

M. A.

Chen

J. Y. C.

Barnes

M. J.

Barber

Procci

(2016). Intelligent agent transparency in human–agent teaming for multi-UxV management. Human Factors, 58(3), 401–415.

17.

Nguyen

Cohen

M. C.

Kao

H. T.

Engberson

Penafiel

Lynch

Volkova

(2024). Exploratory models of HAI teams: Leveraging human digital twins to investigate trust development. arXiv preprint. arXiv:2411.01049.

18.

O’Neill

McNeese

Barron

Schelble

(2022). Human–autonomy teaming: A review and analysis of the empirical literature. Human Factors, 64(5), 904–938.

19.

Park

J. S.

O’Brien

J. C.

Cai

C. J.

Morris

M. R.

Liang

Bernstein

M. S.

(2023). Generative agents: Interactive simulacra of human behavior [Conference session]. Proceedings of the 36th annual ACM symposium on user interface software and technology (UIST’23), San Francisco, CA, USA, pp. 825–841.

20.

Schaefer

K. E.

Chen

J. Y. C.

Szalma

J. L.

Hancock

P. A.

(2016). A meta-analysis of factors influencing the development of trust in automation: Implications for understanding autonomy in future systems. Human Factors, 58(3), 377–400.

21.

Volkova

Glenski

Ayton

Saldanha

Mendoza

Arendt

Shaw

Cronk

Smith

Greaves

(2021). Machine intelligence to detect, characterize, and defend against influence operations in the information environment. Journal of Information Warfare, 20(2), 42–66.

22.

Volkova

Nguyen Penafiel

Kao

H.-T.

Cohen

Engberson

Cassani

Almutairi

Chiang

Banerjee

Belcher

Ford

T. W.

Yankoski

M. G.

Weninger

Gomez-Zara

Rebensky

(2025). VirtLab: Augmented intelligence for training and evaluating human-AI team dynamics through digital twin interactions. In Sottilare

R. A.

Schwarz

(Eds.), Human-computer interaction international conference, HCII 2025, Lecture notes in computer science, vol 15813. Springer.

23.

Volkova

Orvis

(2024). Compound AI ecosystem: Agents and tools to improve training and learning [Conference session]. Proceedings of the interservice/industry training, simulation and education conference, Orlando, FL, USA.

24.

Wright

J. L.

Chen

J. Y.

Lakhmani

S. G.

(2019). Agent transparency and reliability in human–robot interaction: The influence on user confidence and perceived reliability. IEEE Transactions on Human-Machine Systems, 50(3), 254–263.

VirTLab-Eval: Human-Agent Team and Digital Twin Performance Evaluation Demonstration

Abstract

Keywords

Get full access to this article

References