Active uncertainty reduction for safe and efficient interaction planning: A shielding-aware dual control approach

Abstract

The ability to accurately predict others’ behavior is central to the safety and efficiency of robotic systems in interactive settings, such as human–robot interaction and multi-robot teaming tasks. Unfortunately, robots often lack access to key information on which these predictions may hinge, such as other agents’ goals, attention, and willingness to cooperate. Dual control theory addresses this challenge by treating unknown parameters of a predictive model as stochastic hidden states and inferring their values at runtime using information gathered during system operation. While able to optimally and automatically trade off exploration and exploitation, dual control is computationally intractable for general interactive motion planning, mainly due to the fundamental coupling between the robot’s trajectory plan and its prediction of other agents’ intent. In this paper, we present a novel algorithmic approach to enable active uncertainty reduction for interactive motion planning based on the implicit dual control paradigm. Our approach relies on sampling-based approximation of stochastic dynamic programming, leading to a model predictive control problem that can be readily solved by real-time gradient-based optimization methods. The resulting policy is shown to preserve the dual control effect for a broad class of predictive models with both continuous and categorical uncertainty. To ensure the safe operation of the interacting agents, we use a runtime safety filter (also referred to as a “shielding” scheme), which overrides the robot’s dual control policy with a safety fallback strategy when a safety-critical event is imminent. We then augment the dual control framework with an improved variant of the recently proposed shielding-aware robust planning scheme, which proactively balances the nominal planning performance with the risk of high-cost emergency maneuvers triggered by low-probability agent behaviors. We demonstrate the efficacy of our approach with both simulated driving studies and hardware experiments using 1/10 scale autonomous vehicles.

Keywords

Planning under uncertainty human–robot interaction dual control theory stochastic model predictive control safe learning

Get full access to this article

View all access options for this article.

References

Agrawal

Sreenath

(2017) Discrete control barrier functions for safety-critical control of discrete systems with application to bipedal robot navigation. Robotics: Science and Systems XIII 13. DOI: 10.15607/RSS.2017.XIII.073.

Ames

Grizzle

, et al. (2016) Control barrier function based quadratic programs for safety critical systems. IEEE Transactions on Automatic Control 62(8): 3861–3876.

Andersson

Gillis

Horn

, et al. (2019) Casadi: a software framework for nonlinear optimization and optimal control. Mathematical Programming Computation 11(1): 1–36.

Arcari

Hewing

Schlichting

, et al. (2020a) Dual stochastic MPC for systems with parametric and structural uncertainty. Learning for Dynamics and Control. PMLR 894–903.

Arcari

Hewing

Zeilinger

(2020b) An approximate dynamic programming approach for dual stochastic model predictive control. IFAC-PapersOnLine 53(2): 8105–8111.

Bae

Saxena

Nakhaei

, et al. (2020) Cooperation-aware lane change maneuver in dense traffic based on model predictive control with recurrent neural network 2020 American Control Conference (ACC). Denver, CO, USA. 01-03 July 2020. IEEE, pp. 1209–1216.

Bajcsy

Siththaranjan

Tomlin

, et al. (2021) Analyzing human models that adapt online. 2021 IEEE International Conference on Robotics and Automation (ICRA). Piscataway, NJ: IEEE, pp. 2754–2760.

Bandyopadhyay

Won

Frazzoli

, et al. (2013) Intention-aware motion planning. Algorithmic Foundations of Robotics X. Berlin: Springer, pp. 475–491.

Bansal

Chen

Herbert

, et al. (2017) Hamilton-Jacobi reachability: a brief overview and recent advances IEEE Conference on Decision and Control (CDC), Melbourne, VIC, Australia. 12-15 December 2017. pp. 2242–2253.

10.

Bar-Shalom

Tse

(1974) Dual effect, certainty equivalence, and separation in stochastic control. IEEE Transactions on Automatic Control 19(5): 494–500.

11.

Başar

Olsder

(1998) Dynamic Noncooperative Game Theory. Philadelphia, PA: SIAM.

12.

Baseggio

Beghi

Bruschetta

, et al. (2011) An MPC approach to the design of motion cueing algorithms for driving simulators 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC). Washington, DC, 05-07 October 2011, IEEE, pp. 692–697.

13.

Bastani

(2021) Safe reinforcement learning with nonlinear dynamics via model predictive shielding 2021 American Control Conference (ACC). New Orleans, LA, 25-28 May 2021. IEEE, pp. 3488–3494.

14.

Bastani

(2021) Safe reinforcement learning via statistical model predictive shielding. Robotics: Science and Systems 1–13.

15.

Bellman

(1966) Dynamic programming. Science 153(3731): 34–37.

16.

Bernardini

Bemporad

(2011) Stabilizing model predictive control of stochastic constrained linear systems. IEEE Transactions on Automatic Control 57(6): 1468–1480.

17.

Bhambri

Bhattacharjee

Bertsekas

(2022) Reinforcement Learning Methods for Wordle: A POMDP/Adaptive Control Approach. Available online: arXiv preprint arXiv:2211.10298.

18.

Bishop

(2006) Pattern Recognition and Machine Learning. Berlin: Springer.

19.

Bizyaeva

Franci

Leonard

(2022) Nonlinear opinion dynamics with tunable sensitivity. IEEE Transactions on Automatic Control 68(3): 1415–1430.

20.

Blanchini

(1999) Set invariance in control. Automatica 35(11): 1747–1767.

21.

Bobu

Bajcsy

Fisac

, et al. (2020) Quantifying hypothesis space misspecification in learning from human-robot demonstrations and physical corrections. IEEE Transactions on Robotics 36(3): 835–854.

22.

Bonzanini

Paulson

Mesbah

(2020) Safe learning-based model predictive control under state-and input-dependent uncertainty using scenario trees IEEE Conference on Decision and Control (CDC), Jeju, South Korea, 14-18 December 2020. pp. 2448–2454.

23.

Bui

Giovanis

Chen

, et al. (2022) Optimizeddp: An Efficient, User-Friendly Library for Optimal Control and Dynamic Programming. Available online: arXiv preprint arXiv:2204.05520.

24.

Chen

Herbert

, et al. (2021) FaSTrack:A modular framework for real-time motion planning and guaranteed safe tracking. IEEE Transactions on Automatic Control 66(12): 5861–5876.

25.

Chen

Rosolia

Ubellacker

, et al. (2022) Interactive multi-modal motion planning with branch model predictive control. IEEE Robotics and Automation Letters 7(2): 5365–5372.

26.

Chow

Nachum

Duenez-Guzman

, et al. (2018) A lyapunov-based approach to safe reinforcement learning. Advances in Neural Information Processing Systems 31.

27.

Chung

Paranjape

Dames

, et al. (2018) A survey on aerial swarm robotics. IEEE Transactions on Robotics 34(4): 837–855.

28.

Dayan

Sejnowski

(1996) Exploration bonuses and dual control. Machine Learning 25: 5–22.

29.

Duff

(2004) MA57—a code for the solution of sparse symmetric definite and indefinite systems. ACM Transactions on Mathematical Software 30(2): 118–144.

30.

Feldbaum

(1960) Dual Control Theory. I. Avtomatika i Telemekhanika 21(9): 1240–1249.

31.

Fisac

Akametalu

Zeilinger

, et al. (2018a) A general safety framework for learning-based control in uncertain robotic systems. IEEE Transactions on Automatic Control 64(7): 2737–2752.

32.

Fisac

Bajcsy

Herbert

, et al. (2018b) Probabilistically safe robot planning with confidence-based human predictions. Robotics: Science and Systems.

33.

Fisac

Bronstein

Stefansson

, et al. (2019) Hierarchical game-theoretic planning for autonomous vehicles IEEE International Conference on Robotics and Automation (ICRA). Piscataway, NJ: IEEE, 9590–9596.

34.

Fridovich-Keil

Ratner

Peters

, et al. (2020) Efficient iterative linear-quadratic approximations for nonlinear multi-player general-sum differential games. Proceedings - IEEE International Conference on Robotics and Automation. Paris, France, 31 May 2020 - 31 August 2020: 1475–1481.

35.

Gill

Murray

Saunders

(2005) SNOPT: an SQP algorithm for large-scale constrained optimization. SIAM Review 47(1): 99–131.

36.

Guo

, et al. (2013) Understanding and modeling the human driver behavior based on MPC. IFAC Proceedings Volumes 46(21): 133–138.

37.

Hardy

Campbell

(2013) Contingency planning over probabilistic obstacle predictions for autonomous road vehicles. IEEE Transactions on Robotics 29(4): 913–929.

38.

Hijab

(1984) Entropy and dual control IEEE Conference Decision Control (CDC). Las Vegas, NV, USA, 12-14 December 1984: 45–50.

39.

Hsu

Fisac

(2023) The safety filter: a unified view of safety-critical control in autonomous systems. Annual Review of Control, Robotics, and Autonomous Systems. San Mateo, California: Annual Reviews, In press.

40.

Fisac

(2022) Active uncertainty reduction for human-robot interaction: an implicit dual control approach Algorithmic Foundations of Robotics XV. Berlin: Springer: 385–401.

41.

Nakamura

Fisac

(2022) SHARP: shielding-aware robust planning for safe and efficient human-robot interaction. IEEE Robotics and Automation Letters 7(2).

42.

Zhang

Nakamura

, et al. (2023) Deception game: closing the safety-learning loop in interactive robot autonomy Conference on Robot Learning. PMLR, In press.

43.

Isele

(2019) Interactive decision making for autonomous vehicles in dense traffic. IEEE Intelligent Transportation Systems Conference (ITSC).: 3981–3986.

44.

Isele

Nakhaei

Fujimura

(2018a) Safe reinforcement learning on autonomous vehicles. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 1–6.

45.

Isele

Rahimi

Cosgun

, et al. (2018b) Navigating occluded intersections with autonomous vehicles using deep reinforcement learning. IEEE International Conference on Robotics and Automation (ICRA). 2034–2039.

46.

Kabzan

Hewing

Liniger

, et al. (2019) Learning-based model predictive control for autonomous racing. IEEE Robotics and Automation Letters 4(4): 3363–3370.

47.

Klenske

Hennig

(2016) Dual control for approximate bayesian reinforcement learning. Journal of Machine Learning Research 17(1): 4354–4383.

48.

Koopman

(2018) The heavy tail safety ceiling automated and connected vehicle systems testing symposium. https://users.ece.cmu.edu/∼koopman/pubs/koopman18_heavy_tail_ceiling.pdf.

49.

Lee

Nayeer

Garcia

, et al. (2020) Identifying the operational design domain for an automated driving system through assessed risk. IEEE Intelligent Vehicles Symposium (IV): 1317–1322.

50.

Leonard

Paley

Lekien

, et al. (2007) Collective motion, sensor networks, and ocean sampling. Proceedings of the IEEE 95(1): 48–74.

51.

Leung

Schmerling

Zhang

, et al. (2020) On infusing reachability-based safety assurance within planning frameworks for human–robot vehicle interactions. The International Journal of Robotics Research 39(10-11): 1326–1345.

52.

Bastani

(2020) Robust model predictive shielding for safe reinforcement learning with stochastic dynamics. IEEE International Conference on Robotics and Automation (ICRA). 7166–7172. DOI: 10.1109/ICRA40945.2020.9196867.

53.

Lindemann

Robey

, et al. (2021) Learning hybrid control barrier functions from data Conference on Robot Learning. PMLR 1351–1370.

54.

Liniger

Domahidi

Morari

(2015) Optimization‐based autonomous racing of 1:43 scale RC cars. Optimal Control Applications and Methods 36(5): 628–647.

55.

Löfberg

(2004) Yalmip : A Toolbox for Modeling and Optimization in MATLAB: Taipei, Taiwan. Proc of the CACSD Conference.

56.

Luce

(1959) Individual Choice Behavior. Oxford, England: John Wiley.

57.

Lucia

Finkler

Engell

(2013) Multi-stage nonlinear model predictive control applied to a semi-batch polymerization reactor under uncertainty. Journal of Process Control 23(9): 1306–1319.

58.

Mesbah

(2018) Stochastic model predictive control with active uncertainty learning: a survey on dual control. Annual Reviews in Control 45: 107–117.

59.

Mitchell

Bayen

Tomlin

(2005) A time-dependent Hamilton-Jacobi formulation of reachable sets for continuous dynamic games. IEEE Transactions on Automatic Control 50(7): 947–957.

60.

Nair

Govindarajan

Lin

, et al. (2021) Stochastic MPC with Multi-modal predictions for traffic intersections. arXiv preprint arXiv:2109.09792.

61.

Ong

Png

Hsu

, et al. (2009) POMDPs for robotic tasks with mixed observability. Robotics: Science and systems 5: 4.

62.

Peters

Fridovich-Keil

Tomlin

, et al. (2020) Inference-based strategy alignment for general-sum differential games. 19th International Conference on Autonomous Agents and Multi Agent Systems. 1037–1045.

63.

Robey

Lindeman

, et al. (2020) Learning control barrier functions from expert demonstrations IEEE Conference on Decision and Control (CDC). Jeju, Korea (South), 14-18 December 2020: 3717–3724.

64.

Rubenstein

Cornejo

Nagpal

(2014) Programmable self-assembly in a thousand-robot swarm. Science 345(6198): 795–799.

65.

Sadigh

Landolfi

Sastry

, et al. (2018) Planning for cars that coordinate with people: leveraging effects on human actions for planning and active information gathering over human internal state. Autonomous Robots 42(7): 1405–1426.

66.

Santos

Diaz-Mercado

Egerstedt

(2018) Coverage control for multirobot teams with heterogeneous sensing capabilities. IEEE Robotics and Automation Letters 3(2): 919–925.

67.

Schildbach

Borrelli

(2015) Scenario model predictive control for lane change assistance on highways IEEE Intelligent Vehicles Symposium. Seoul, Korea (South). 28 June 2015 - 01 July 2015: 611–616.

68.

Schwarting

Pierson

Karaman

, et al. (2021) Stochastic dynamic games in belief space. IEEE Transactions on Robotics 37(6): 2157–2172.

69.

Sehr

Bitmead

(2017) Tractable dual optimal stochastic model predictive control: an example in healthcare IEEE Conference on Control Technology and Applications (CCTA). Maui, HI, USA, 27-30 August 2017: 1223–1228.

70.

Silver

Veness

(2010) Monte-Carlo planning in large POMDPs. Neural Information Processing Systems 23.

71.

Somani

Hsu

, et al. (2013) DESPOT: online POMDP planning with regularization. Advances in Neural Information Processing Systems 26.

72.

Srinivasa

Lancaster

Michalove

, et al. (2019) MuSHR: A Low-Cost, Open-Source Robotic Racecar for Education and Research. CoRR abs/1908.

73.

Stahl

Wilson

(1994) Experimental evidence on players' models of other players. Journal of Economic Behavior & Organization 25(3): 309–327.

74.

Sun

Kretzschmar

Dotiwalla

, et al. (2020) Scalability in perception for autonomous driving: Waymo open dataset Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, 13-19 June 2020: 2446–2454.

75.

Sunberg

Kochenderfer

(2018) Online algorithms for POMDPs with continuous state, action, and observation spaces. International Conference on Automated Planning and Scheduling. 28.

76.

Sunberg

Kochenderfer

(2022) Improving automated driving through POMDP planning with human internal states. IEEE Transactions on Intelligent Transportation Systems 23(11): 20073–20083.

77.

Sunberg

Kochenderfer

(2017) The value of inferring the internal state of traffic participants for autonomous freeway driving 2017 American Control Conference. Piscataway, NJ, IEEE, pp. 3004–3010.

78.

Swain

Couzin

Leonard

(2011) Real-time feedback-controlled robotic fish for behavioral experiments with fish schools. Proceedings of the IEEE 100(1): 150–163.

79.

Heirung

TAN

Foss

Ydstie

(2015) MPC-based dual control with online experiment design. Journal of Process Control 32: 64–76.

80.

Tian

Sun

Tomizuka

, et al. (2021) Anytime game-theoretic planning with active reasoning about humans’ latent states for human-centered robots IEEE International Conference on Robotics and Automation (ICRA). Philadelphia, PA, USA, 23-27 May 2022: 4509–4515.

81.

Tian

Sun

Bajcsy

, et al. (2022) Safety assurances for human-robot interaction via confidence-aware game-theoretic human models 2022 International Conference on Robotics and Automation (ICRA). Philadelphia, PA, USA, 23-27 May 2022: 11229–11235.

82.

Todorov

(2005) A generalized iterative lqg method for locally-optimal feedback control of constrained nonlinear stochastic systems. American Control Conference 300–306.

83.

Toghi

Valiente

Sadigh

, et al. (2021) Cooperative autonomous vehicles that sympathize with human drivers. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 4517–4524.

84.

Toghi

Valiente

Sadigh

, et al. (2022) Social coordination and altruism in autonomous driving. IEEE Transactions on Intelligent Transportation Systems 23(12): 24791–24804.

85.

Tokekar

Hook

Mulla

, et al. (2016) Sensor planning for a symbiotic uav and ugv system for precision agriculture. IEEE Transactions on Robotics 32(6): 1498–1511.

86.

Wächter

Biegler

(2006) On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical Programming 106(1): 25–57.

87.

Wabersich

Zeilinger

(2021) A predictive safety filter for learning-based control of constrained nonlinear dynamical systems. Automatica 129: 109597.

88.

Kreidieh

Parvate

, et al. (2021) Flow: a modular learning framework for mixed autonomy traffic. IEEE Transactions on Robotics 38(2): 1270–1286.

89.

Zanardi

Mion

Bruschetta

, et al. (2021) Urban driving games with lexicographic preferences and socially efficient nash equilibria. IEEE Robotics and Automation Letters 6(3): 4978–4985.

90.

Zeilinger

Morari

Jones

(2014) Soft constrained model predictive control with robust stability guarantees. IEEE Transactions on Automatic Control 59: 1190–1202.

91.

Zhang

Liniger

Borrelli

(2020) Optimization-based collision avoidance. IEEE Transactions on Control Systems Technology 29(3): 972–983.

92.

Ziebart

Maas

Bagnell

, et al. (2008) Maximum entropy inverse reinforcement learning. AAAI 8: 1433–1438.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB