Sage Journals: Discover world-class research

Abstract

The average reward problem in the traditional Monte-Carlo tree search algorithms troubles the sudden death games for a long time, because of the average reward criterion will reduce the probability of a deterministic result. But this does not bother the non-sudden death games, such as Go, which can focus only on higher win rates rather than higher scores. In this work, we propose the miniMax-Monte-Carlo tree search with depth rewards to Outer-Open Gomoku (a variant of Gomoku) to discover a forced win/lose without any human knowledge, evaluation function, or pre-training. And it can solve not only the average reward problem but also the inaccurate win-rate problem in deep playout simulation in sudden death games. Finally, we propose a new integrated framework named BBQ (Big, Best, Quick win) MCTS for improving the performance of traditional MCTS.

Keywords

Monte-Carlo tree search miniMax search depth rewards sudden death games Outer-Open Gomoku Tic-Tac-Toe

Get full access to this article

View all access options for this article.

References

Baier, H. & Winands, M.H. (2013). Monte-Carlo tree search and minimax hybrids. In 2013 IEEE Conference on Computational Intelligence in Games (CIG) (pp. 1–8).

Baier, H. & Winands, M.H. (2015). MCTS-minimax hybrids. IEEE Transactions on Computational Intelligence and AI in Games, 7, 167–179. doi:10.1109/TCIAIG.2014.2366555.

Browne, C., Powley, E.J., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Liebana, D.P., Samothrakis, S. & Colton, S. (2012). A survey of Monte Carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games, 4, 1–43. doi:10.1109/TCIAIG.2012.2186810.

Chaslot, G., Bakkes, S., Szita, I. & Spronck, P. (2008). Monte-Carlo tree search: A new framework for game AI. In 4 ^th Conference on Artificial Intelligence and Interactive Digital Entertainment (pp. 216–217).

Coulom, R. (2007). Efficient selectivity and backup operators in Monte-Carlo tree search. In

H.J.

van den Herik ,

Ciancarini and

H.H.L.M.

Donkers (Eds.), Computers and Games. CG 2006. Lecture Notes in Computer Science (Vol. 4630). Berlin: Springer.

Kocsis, L. & Szepesvári, C. (2006). Bandit based Monte-Carlo planning. In

Fürnkranz ,

Scheffer and

Spiliopoulou (Eds.), 17 ^th European Conference on Machine Learning, ECML 2006 . Lecture Notes in Computer Science (Vol. 4212). Berlin: Springer.

Liu, M. (2017). General game-playing with Monte Carlo tree search. Available at: https://medium.com/@quasimik. Retrieved July 28, 2018.

Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Driessche, G.V., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T.P., Leach, M., Kavukcuoglu, K., Graepel, T. & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529, 484–489. doi:10.1038/nature16961.

Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L.R., Lai, M., Bolton, A., Chen, Y., Lillicrap, T.P., Hui, F., Sifre, L., Driessche, G.V., Graepel, T. & Hassabis, D. (2017). Mastering the game of Go without human knowledge. Nature, 550, 354–359. doi:10.1038/nature24270.

Some improvements in Monte Carlo tree search algorithms for sudden death games

Abstract

Keywords

Get full access to this article

References