Sage Journals: Discover world-class research

Abstract

Multidisciplinary Design Optimization (MDO) is a computational approach for optimizing design of a complex system of systems that require knowledge from multiple disciplines. In a former study, we explored and found that the individual discipline feasible (IDF), a type of MDO design technique, performed well in several benchmark test cases of decentralized Reinforcement Learning (RL) problems, in particular, stabilizing an unknown system. However, the earlier study was not able to resolve as to why the overall system of systems, even with strongly coupled systems, could be stabilized when each agent just focused on stabilizing itself. In this work, we make significant extension in resolving this behavior by conducting a theoretical analysis of the MDO solution of RL problems. Through the analysis, we show that with the proper control law, each MDO agent should be able to bring its state closer to the 0-stable point regardless of how the other agents’ states impact the state of the whole system. This is the main reason why the ‘selfish’ MDO-IDF agents are successful in learning to stabilize the overall system. The simulation results, including benchmark test cases, verify our analysis. Therefore, we propose that the MDO would be a promising solution in many other decentralized RL problems.

Keywords

Reinforcement learning multidisciplinary design optimization adaptive control individual discipline feasible

Get full access to this article

View all access options for this article.

References

Abu-Khalaf and

F.L.

Lewis, Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach, Automatica 41(5) (2005), 779–791. doi:10.1016/j.automatica.2004.11.034.

N.M.

Alexandrov and

M.Y.

Hussaini, Multidisciplinary Design Optimization: State of the Art, SIAM, 1997.

N.M.

Alexandrov and

R.M.

Lewis, Analytical and computational aspects of collaborative optimization for multidisciplinary design, AIAA Journal 40(2) (2002), 301–309. doi:10.2514/2.1646.

Arslan and

Yuksel, Decentralized Q-learning for stochastic teams and games, IEEE Transactions on Automatic Control 62(4) (2016), 1545–1558. doi:10.1109/TAC.2016.2598476.

D.P.

Atherton and

D.P.

Atherton, Nonlinear Control Engineering, Van Nostrand Reinhold, New York, 1982.

R.J.

Balling and

Sobieszczanski-Sobieski, Optimization of coupled systems-a critical overview of approaches, AIAA Journal 34(1) (1996), 6–17. doi:10.2514/3.13015.

R.W.

Beard,

G.N.

Saridis and

J.T.

Wen, Galerkin approximations of the generalized Hamilton–Jacobi–Bellman equation, Automatica 33(12) (1997), 2159–2177. doi:10.1016/S0005-1098(97)00128-3.

Busoniu,

Babuska and

De Schutter, A comprehensive survey of multiagent reinforcement learning, IEEE Transactions on Systems, Man, And Cybernetics-Part C: Applications and Reviews 38(2) (2008). doi:10.1109/TSMCC.2007.913919.

Buşoniu,

Babuška and

De Schutter, Multi-Agent Reinforcement Learning: An Overview: Innovations in Multi-Agent Systems and Applications-1, Springer, 2010, pp. 183–221.

10.

E.J.

Cramer,

Dennis,

John,

P.D.

Frank,

R.M.

Lewis and

G.R.

Shubin, Problem formulation for multidisciplinary optimization, SIAM Journal on Optimization 4(4) (1994), 754–776. doi:10.1137/0804044.

11.

F.L.

Da Silva,

Glatt and

A.H.R.

Costa, MOO-MDP: An object-oriented representation for cooperative multiagent reinforcement learning, IEEE Transactions on Cybernetics 99 (2017), 1–13.

12.

Deb, Current trends in evolutionary multi-objective optimization, International Journal for Simulation and Multidisciplinary Design Optimization 1(1) (2007), 1–8. doi:10.1051/ijsmdo:2007001.

13.

Giesing and

J.-F.

Barthelemy, A summary of industry MDO applications and needs, in: 7th AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, 1998, p. 4737.

14.

Hecht-Nielsen, Theory of the Backpropagation Neural Network: Neural Networks for Perception, Elsevier, 1992, pp. 65–93.

15.

G.E.

Hinton and

T.J.

Sejnowski, Unsupervised Learning: Foundations of Neural Computation, MIT press, 1999.

16.

Hooke, De Potentia Restitutiva, or of Spring Explaining the Power of Springing Bodies, Vol. 1678, John Martyn, London, UK, p. 23.

17.

C.-S.

Huang,

Wang and

Teo, Solving Hamilton–Jacobi–Bellman equations by a modified method of characteristics, Nonlinear Analysis: Theory, Methods & Applications 40(1) (2000), 279–293. doi:10.1016/S0362-546X(00)85016-6.

18.

P.A.

Ioannou, Decentralized adaptive control of interconnected systems, IEEE Transactions on Automatic Control 31(4) (1986), 291–298. doi:10.1109/TAC.1986.1104282.

19.

Jiang and

Jiang, Robust Adaptive Dynamic Programming, John Wiley & Son, Inc., 2017.

20.

Jiang and

Jiang, Off-policy learning for a turbocharged diesel engine, from, http://yu-jiang.github.io/radpbookdemos/Ch2Ex2/.

21.

K.J.

Keesman, System Identification: An Introduction, Springer-Verlag, 2011.

22.

Kleppner and

Kolenkow, An Introduction to Mechanics, Cambridge University Press, 2013.

23.

Lancaster and

Rodman, Algebraic Riccati Equations, Clarendon press, 1995.

24.

J.W.

Lee and

Jangmin, A multi-agent Q-learning framework for optimizing stock trading systems, in: International Conference on Database and Expert Systems Applications, Berlin, Germany, 2002, pp. 153–162. doi:10.1007/3-540-46146-9_16.

25.

Liberzon, Calculus of Variations and Optimal Control Theory: A Concise Introduction, Princeton University Press, 2012.

26.

Liu,

Wang and

Li, Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach, IEEE transactions on neural networks and learning systems 25(2) (2014), 418–428. doi:10.1109/TNNLS.2013.2280013.

27.

Lyzell, Initialization Methods for System Identification, Linköping University Electronic Press, 2009.

28.

J.R.

Martins and

A.B.

Lambe, Multidisciplinary design optimization: A survey of architectures, AIAA Journal 51(9) (2013), 2049–2075. doi:10.2514/1.J051895.

29.

I.N.C.

Mathwork and Train , Train Neural Network, retrieved from https://www.mathworks.com/help/nnet/ref/train.html on Dec 15, 2018.

30.

K.S.

Narendra and

Parthasarathy, Identification and control of dynamical systems using neural networks, IEEE Transactions on Neural Networks 1(1) (1990), 4–27. doi:10.1109/72.80202.

31.

D.T.

Nguyen,

Kumar and

H.C.

Lau, Policy gradient with value function approximation for collective multiagent planning, Advances in Neural Information Processing Systems (2017), 4322–4332.

32.

D.T.

Nguyen,

Yeoh,

H.C.

Lau,

Zilberstein and

Zhang, Decentralized multi-agent reinforcement learning in average-reward dynamic DCOPs, in: Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014, pp. 1341–1342.

33.

Nguyen and

Mukhopadhyay, Identification and optimal control of large-scale system using selective decentralization, in: Proc. IEEE International Conference on Systems. Man and Cybernetics, Budapest, 2016, pp. 503–508.

34.

Nguyen and

Mukhopadhyay, Selectively decentralized Q-learning, in: Proc. IEEE International Conference on Systems, Man, and Cybernetics, Bannf, Canada, 2017, pp. 328–333.

35.

Nguyen and

Mukhopadhyay, Two-phase selective decentralization to improve reinforcement learning systems with MDP, AI Communications 31(4), 319–337. doi:10.3233/AIC-180766.

36.

Nguyen and

Mukhopadhyay, Multidisciplinary Optimization in Decentralized Reinforcement Learning, 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico 2017, pp. 779–784.

37.

T.M.

Nguyen, Selectively Decentralized Reinforcement Learning, 2018.

38.

Russell and

Norvig, Artificial Intelligence a Modern Approach, 3rd edn, Prentice Hall, 2010.

39.

G.N.

Saridis and

C.-S.G.

Lee, An approximation theory of optimal control for trainable manipulators, Systems, IEEE Transactions on Man and Cybernetics 9(3) (1979), 152–159. doi:10.1109/TSMC.1979.4310171.

40.

Scharpff,

D.M.

Roijers,

F.A.

Oliehoek,

M.T.

Spaan and

M.M.

de Weerdt, Solving Transition-Independent Multi-Agent MDPs with Sparse Interactions, Thirtieth AAAI Conference on Artificial Intelligence (2016), 3174–3180.

41.

Shi and

S.K.

Singh, Decentralized adaptive controller design for large-scale systems with higher order interconnections, IEEE Transactions on Automatic Control 37(8) (1992), 1106–1118. doi:10.1109/9.151092.

42.

Stewart, in: Calculus, 6th edn, Thomson Learning, 2013.

43.

Tampuu,

Matiisen,

Kodelja,

Kuzovkin,

Korjus,

Aru,

Aru and

Vicente, Multiagent cooperation and competition with deep reinforcement learning, PLoS one 12(4), e0172395. doi:10.1371/journal.pone.0172395.

44.

Van De Steeg,

M.M.

Drugan and

Wiering, Temporal difference learning for the game tic-tac-toe 3d: Applying structure to neural networks, IEEE Symposium Series on Computational Intelligence (2015), 564–570.

45.

J.-S.

Wang and

Y.-P.

Chen, A fully automated recurrent neural network for unknown dynamic system identification and control, IEEE Transactions on Circuits and Systems I: Regular Papers 53(6) (2006), 1363–1372.

46.

C.J.

Watkins and

Dayan, Q-learning, Machine Learning 8(3–4) (1992), 279–292. doi:10.1007/BF00992698.

47.

Wei,

F.L.

Lewis,

Liu,

Song and

Lin, Discrete-time local value iteration adaptive dynamic programming: Convergence analysis, IEEE Transactions on Systems, Man, and Cybernetics: Systems (2017).

48.

Wei,

F.L.

Lewis,

Sun,

Yan and

Song, Discrete-time deterministic Q-learning: A novel convergence analysis, IEEE transactions on cybernetics 47(5) (2017), 1224–1237. doi:10.1109/TCYB.2016.2542923.

49.

Wei,

Liu and

Lin, Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems, IEEE Transactions on cybernetics 46(3) (2016), 840–853. doi:10.1109/TCYB.2015.2492242.

50.

Wei,

Liu and

Lin, Discrete-time local iterative adaptive dynamic programming: Terminations and admissibility analysis, IEEE Transactions on Neural Networks and Learning Systems (2016).

51.

Wei,

Liu,

Lin and

Song, Adaptive dynamic programming for discrete-time zero-sum games, IEEE Transactions on Neural Networks and Learning Systems (2017).

52.

Wei,

Liu,

Lin and

Song, Discrete-time optimal control via local policy iteration adaptive dynamic programming, IEEE transactions on cybernetics (2017).

53.

Wei,

Song and

Yan, Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using ADP, IEEE transactions on neural networks and learning systems 27(2) (2016), 444–458. doi:10.1109/TNNLS.2015.2464080.

54.

Weiss, Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence, MIT press, 1999.

55.

Xiao-Qun and

Jun-An, Parameter identification and backstepping control of uncertain Lü system, Chaos, Solitons & Fractals 18(4) (2003), 721–729. doi:10.1016/S0960-0779(02)00659-8.

Why the ‘selfish’ optimizing agents could solve the decentralized reinforcement learning problems

Abstract

Keywords

Get full access to this article

References