First steps toward behavioral models for fire evacuation simulations using unity ML-agent toolkit

Abstract

This paper explores an initial attempt to use the Unity ML-Agents toolkit to model the behavior of people evacuating from indoor fires. The virtual environment was created in the Unity game engine and populated with humanoid agents capable of moving autonomously within the scene. Each agent perceives information from the rendered environment, such as surfaces, directions, and line-of-sight depth and uses it to navigate toward the nearest exit. Agents were trained through reinforcement learning, using the Proximal Policy Optimization (PPO) algorithm to balance rewards and penalties for their actions. We tested five different reward schemes in single-agent simulations to observe how these affect navigation behavior. Among them, the version referred to as mark5 showed the most plausible and efficient evacuation strategy, reaching the exit quickly while avoiding collisions. The same trained agent was then used in multi-agent settings, where its performance remained stable even with groups of up to 20 individuals. These first results suggest that Unity ML-Agents can offer a practical foundation for building more realistic and adaptive evacuation models.

Keywords

Reinforcement learning unity ML-agents behavioral simulation Proximal Policy Optimization (PPO)agent-based modeling evacuation crowd dynamics

Get full access to this article

View all access options for this article.

References

Gwynne

Amos

Kinateder

, et al. The future of evacuation drills: assessing and enhancing evacuee performance. Safe Sci 2020; 129: 104767.

Filippidis

Lawrence

Veeraswamy

, et al. Evacuation modelling for rapid multi-hazard tabletop exercise deployment. Safe Sci 2024; 173: 106438.

Holloway

. Emergency preparedness: tabletop exercise improves readiness. Profes Safe 2007; 52: 48–51.

Kinateder

Pauli

Müller

, et al. Human behaviour in severe tunnel accidents: effects of information and behavioural training. Transp Res Part F: Traf Psychol Behav 2013; 17: 20–32.

Renner

Emergency exercise and training techniques. Austr J Emerg Manag 2001; 16: 26–35.

Kuligowski

Gwynne

SMV

Boyce

, et al. Enhancing egress drills: preparation and assessment of evacuee performance. Fire Mater 2019; 43(6): 613–631.

Helbing

Farkas

Vicsek

Simulating dynamical features of escape panic. Nature 2000; 407: 487–490.

Olfati-Saber

. Flocking for multi-agent dynamic systems: algorithms and theory. IEEE Trans Automat Control 2006; 51: 401–420.

Santos

Aguirre

BE.

A critical review of emergency evacuation simulation models. Disaster Research Center, University of Delaware, Newark, Delaware, 2004.

10.

Kisko

Francis

Nobel

CR.

EVACNET4 User’s Guide (Version 10/29/98). Gainesville, FL: University of Florida, 1998.

11.

Wahlqvist

Rubini

Real-time visualization of smoke for fire safety engineering applications. Fire Safe J 2023; 140: 103878.

12.

Gagliardi

Bernardini

Quagliarini

, et al. Characterization and future perspectives of virtual reality evacuation drills for safe built environments: a systematic literature review. Safe Sci 2023; 163: 106141.

13.

Unity Technologies. Developer tools & resources (Pagina web), https://unity.com/developer-tools

14.

Park

O’Brien

Cai

, et al. Generative agents: interactive simulacra of human behavior. arXiv 2304.03442, 2023.

15.

Agarwal

Shridevi

Procedural content generation using reinforcement learning for disaster evacuation training in a virtual 3D environment. IEEE Access 2023; 11: 98607–98617.

16.

Omidbeyk

Ghavami

SM.

Multi-agent simulation of population evacuation during dynamic fire using reinforcement learning based on integration of geographic information systems and building information modeling. J Build Eng 2025; 111: 113035.

17.

Sinpan

Sasithong

Chaudhary

, et al. Simulative investigations of crowd evacuation by incorporating reinforcement learning scheme. In: ICACS '22: Proceedings of the 6th International Conference on Algorithms, Computing and Systems, Larissa, 16–18 September 2023. DOI: 10.1145/3564982.3564983.

18.

Zhao

Liang

, et al. Adversarial reinforcement learning for enhanced decision-making of evacuation guidance robots in intelligent fire Scenarios. IEEE Trans Comput Social Syst 2025; 12: 2030–2046.

19.

Juliani

Berges

Teng

, et al. Unity: a general platform for intelligent agents. arXiv 1809.02627, 2018.

20.

Wharton

Simulation and investigation of multi-agent reinforcement learning for building evacuation scenarios. Technical report, University of Oxford, Oxford, 2009.

21.

Schulman

Wolski

Dhariwal

, et al. Proximal policy optimization algorithms. arXiv 1707.06347, 2017.

22.

Google Patents. Methods and apparatus for reinforcement learning. US20150100530A1 Patent, 2015.

23.

OpenAI. Vanilla Policy Gradient (VPG). Spinning Up documentation, https://spinningup.openai.com/en/latest/algorithms/vpg.html

24.

Conn

Gould

NIM

Toint

PL.

Trust region methods. Philadelphia, PA: SIAM, 2000. DOI: 10.1137/1.9780898719857.

25.

Rosa

Falqueiro

Bonacin

, et al. EvacuAI: an analysis of escape routes in indoor environments with the aid of reinforcement learning. Sensors 2023; 23: 8892.

26.

Prakosa

Nugroho

SMS

Wulandari

DP.

Ship evacuation simulation based on reinforcement learning: a case study on NPCs behavior. In: 2024 International Seminar on Intelligent Technology and Its Applications (ISITIA), Mataram, Indonesia, 10–12 July 2024.

27.

Gwynne

Galea

Owen

, et al. A review of the methodologies used in the computer simulation of evacuation from the built environment. Build Environ 1999; 34: 741–749.

28.

Bandini

Manzoni

Vizzari

Agent based modeling and simulation: an informatics perspective. J Artif Societ Social Simul 2009; 12: 4.

29.

Kirchner

Schadschneider

Simulation of evacuation processes using a bionics-inspired cellular automaton model for pedestrian dynamics. Physica A 2002; 312: 260–276.

30.

Henderson

JM.

Human gaze control during real-world scene perception. Trend Cognit Sci 2003; 7: 498–504.

31.

Pelechano

Allbeck

Badler

NI.

Controlling individual agents in high-density crowd simulation. In: Proceedings of SCA ’07, San Diego, CA, 4–5 August 2007. DOI: 10.1145/1272690.1272705.

32.

Harada

Russell

Policy invariance under reward transformations: theory and application to reward shaping. In: ICML '99: Proceedings of the Sixteenth International Conference on Machine Learning, San Francisco, CA, 27–30 June 1999.

33.

Mnih

Kavukcuoglu

Silver

, et al. Human-level control through deep reinforcement learning. Nature 2015; 518: 529–533.

34.

Oliveira

Reinforcement learning for emergency evacuation in a multi-agent environment. Appl Sci 2025; 15: 30–45.

35.

Zhu

Becerik -Gerber

Lin

, et al. Behavioral, data-driven, agent-based evacuation simulation for building safety design using machine learning and discrete choice models. Adv Eng Inform 2023; 55: 101827.

36.

Silver

Singh

Precup

, et al. Reward is enough. Artif Intel 2021; 299: 103535.

37.

Goodfellow

Bengio

Courville

Deep learning. Cambridge, MA: MIT Press, 2016.