RAG-SafeAdapt: a multimodal retrieval-augmented model for safety and interpretability in autonomous driving

Abstract

To address transparency and trust concerns in autonomous driving systems, often criticized as black box models, we propose RAG-SafeAdapt, a retrieval-augmented vision-language model designed for complex traffic scenarios. This end-to-end framework integrates the CARLA simulator with Retrieval-Augmented Generation (RAG) and multimodal knowledge bases. The system combines visual and language inputs to generate game-theory-based safety recommendations through Responsibility-Sensitive Safety (RSS) principles and Vision-Language Models (VLMs). RAG-SafeAdapt enhances safety analysis and interpretable decision-making, with deployment compatibility for platforms such as NVIDIA Orin. Evaluation on datasets including Berkeley DeepDrive eXplanation (BDD-X) and NuScenes-QA demonstrates improved decision explainability and generalization across diverse driving scenarios. Experimental results show superior zero-shot generalization capabilities, enhanced transparency, and a reduced collision rate of 0.35%. The framework effectively addresses key challenges in navigation clarity, sensor precision, and adaptability while fostering trust in autonomous driving through optimized real-time safety and human-readable explanations (The bold formatting in the text is used to highlight proper nouns and key terms for emphasis).

Keywords

Retrieval-augmented generation responsibility-sensitive safety vision-language models explainability sensor fusion autonomous driving

Get full access to this article

View all access options for this article.

References

Atakishiyev

Salameh

Yao

, et al. Explainable artificial intelligence for autonomous driving: a comprehensive overview and field guide for future research directions. IEEE Access 2024; 12: 101603.

Liu

, et al. Exploring the causality of end-to-end autonomous driving. arXiv preprint arXiv:240706546, 2024.

Cui

Cao

, et al. Lampilot: an open benchmark dataset for autonomous driving with language model programs. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15141–15151.

Kuznietsov

Gyevnar

Wang

, et al. Explainable AI for safe and trustworthy autonomous driving: a systematic review. arXiv preprint arXiv:240210086, 2024.

Parekh

Poddar

Rajpurkar

, et al. A review on autonomous vehicles: progress, methods and challenges. Electronics 2022; 11(14): 2162.

Zhang

A review of artificial intelligence in embedded systems. Micromachines 2023; 14(5): 897.

Wang

, et al. Pedestrian trajectory prediction at un-signalized intersection using probabilistic reasoning and sequence learning. In Proceedings of 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Basel, Switzerland, pp. 1047–1053. IEEE.

Jiang

Shi

, et al. Social NStransformers: Low-quality pedestrian trajectory prediction. IEEE Transactions on Artificial Intelligence 2024; 5(11): 5575–5588.

Zhou

Zhang

Shi

, et al. In-context learning for automated driving scenarios. arXiv preprint arXiv:240504135, 2024.

10.

Sadigh

Sastry

Seshia

, et al. Information gathering actions over human internal state. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea (South), pp. 66–73. IEEE.

11.

Sun

Zhan

Tomizuka

, et al. Courteous autonomous cars. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 2018, pp. 663–670. IEEE.

12.

Zhang

Fisac

JF.

Safe occlusion-aware autonomous driving via game-theoretic active perception. arXiv preprint arXiv:210508169, 2021.

13.

Salazar-Gomez

Liu

Diaz-Zapata

, et al. Tlcfuse: Temporal multi-modality fusion towards occlusion-aware semantic segmentation-aided motion planning. arXiv preprint arXiv:231105319, 2023.

14.

Bianchi

Suzgun

Attanasio

, et al. Safety-tuned llamas: lessons from improving the safety of large language models that follow instructions. arXiv preprint arXiv:230907875, 2023.

15.

Yang

Raman

Shah

, et al. Plug in the safety chip: enforcing constraints for LLM-driven robot agents. arXiv preprint 2023; 2309.09919.

16.

Yuan

Sun

Omeiza

, et al. Rag-driver: generalisable driving explanations with retrieval-augmented in-context learning in multi-modal large language model. arXiv preprint arXiv:240210828, 2024.

17.

Mei

Nie

Sun

, et al. Seeking to collide: online safety-critical scenario generation for autonomous driving with retrieval augmented large language models. arXiv preprint arXiv:250500972, 2025.

18.

Ding

Cao

Zhao

, et al. Realgen: retrieval augmented generation for controllable traffic scenarios. In European Conference on Computer Vision, pp. 93–110. Rhodes, Greece: Springer.

19.

Zhang

Xie

, et al. Drivegpt4: Interpretable end-to-end autonomous driving via large language model. arXiv preprint arXiv:231001412, 2023.

20.

Liu

Zhou

, et al. Driving with regulation: interpretable decision-making for autonomous vehicles with retrieval-augmented reasoning via LLM. arXiv preprint arXiv:241004759, 2024.

21.

Driess

Xia

Sajjadi

MSM

, et al. Palm-e: an embodied multimodal language model. arXiv preprint arXiv:230303378, 2023.

22.

Gao

Xiong

Gao

, et al. Retrieval-augmented generation for large language models: a survey. arXiv preprint arXiv:231210997, 2023.

23.

Zhou

Liu

Yurtsever

, et al. Vision language models in autonomous driving: a survey and outlook. IEEE Transactions on Intelligent Vehicles 2024; 1–20.

24.

Tian

, et al. Drivevlm: the convergence of autonomous driving and large vision-language models. arXiv preprint arXiv:240212289, 2024.

25.

Zhang

, et al. VLM-ad: end-to-end autonomous driving through vision-language model supervision. arXiv preprint arXiv:241214446, 2024.

26.

Khayatian

Mehrabian

Allamsetti

, et al. Cooperative driving of connected autonomous vehicles using responsibility-sensitive safety (rss) rules. In Proceedings of the ACM/IEEE 12th International Conference on Cyber-Physical Systems, Nashville, 2021, pp. 11–20. ACM.

27.

Liu

Wang

Hassanin

, et al. Calibration and evaluation of responsibility-sensitive safety (rss) in automated vehicle performance during cut-in scenarios. Transp Res Part C Emerg Technols 2021; 125.

28.

Dinneweth

Boubezoul

Mandiau

, et al. Multi-agent reinforcement learning for autonomous vehicles: a survey. Autonom Intell Syst 2022; 2(1): 27.

29.

Chen

Zhao

, et al. Autonomous vehicles in mixed-autonomy traffic: game theoretic human-like decision making countermeasures. Complex Eng Syst 2024; 4: 25.

30.

Chiang

Lin

, et al. Vicuna: an open-source chatbot impressing gpt-4 with 90%* chatgpt quality. 2023; 2(3):6. https://vicuna.lmsys.org

31.

Touvron

Martin

Stone

, et al. Llama 2: open foundation and fine-tuned chat models. arXiv preprint arXiv:230709288, 2023.

32.

Luan

Chen

, et al. Open-domain visual entity recognition: towards recognizing millions of wikipedia entities. arXiv preprint arXiv:230211154, 2023.

33.

Liu

, et al. Visual instruction tuning. arXiv preprint arXiv:2304084852023.

34.

Schwenk

Khandelwal

Clark

, et al. A-okvqa: a benchmark for visual question answering using world knowledge. arXiv preprint arXiv:2206017182022.

35.

Chen

Luan

, et al. Can pre-trained vision and language models answer visual information-seeking questions?arXiv preprint arXiv:230211713, 2023.

36.

Mensink

Uijlings

Castrejon

, et al. Encyclopedic VQA: visual questions about detailed properties of fine-grained categories. arXiv preprint 2023; 2306.09224.