This paper extends simulation adjusting that optimizes parameters in a policy function so that the Monte-Carlo method produces correct moves, and gives a firm theoretical basis for simulation adjusting. Preliminary experiments show that improved simulation adjusting is capable of adjusting 28 parameters of the policy function in a Go program.
Get full access to this article
View all access options for this article.
References
1.
Araki, N., Muramatsu, M., Hoki, K. & Takahashi, S. (2014). Monte-Carlo simulation adjusting. In Proceedings of the 28thAAAI Conference on Artificial Intelligence (pp. 3094–3095). Quebec City, Quebec, Canada: Association for the Advancement of Artificial Intelligence.
2.
Auer, P., Cesa-Bianchi, N. & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2–3), 235–256. doi:10.1023/A:1013689704352.
3.
Berger, A.L., Della Pietra, V.J. & Della Pietra, S.A. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22(1), 39–71.
4.
Bishop, C.M. (2006). Pattern Recognition and Machine Learning. Singapore: Springer.
5.
Brügmann, B. (1993). Monte Carlo Go. Unpublished technical report.
6.
Coulom, R. (2007). Computing Elo ratings of move patterns in the game of Go. ICGA Journal, 30(4), 198–208.
7.
Fukui, M. (2002). Kakkiteki Igo Zyotatsuho Goroban Mondaisyu (an Epoch-Making Progress Method: 5×5 Go Problems). Tokyo, Japan: The Nihon Ki-in.
8.
Gelly, S. & Silver, D. (2007). Combining online and offline knowledge in UCT. In Proceedings of the 24thInternational Conference on Machine Learning (pp. 273–280). Corvallis, Oregon, USA: The International Machine Learning Society.
9.
Graf, T. & Platzner, M. (2016). Monte-Carlo simulation balancing revisited. In 2016 IEEE Conference on Computational Intelligence and Games (CIG) (pp. 1–7). Santorini, Greece: IEEE.
10.
Hoki, K. & Kaneko, T. (2014). Large-scale optimization for evaluation functions with minimax search. Journal of Artificial Intelligence Research, 49, 527–568.
11.
Huang, S.-C., Coulom, R. & Lin, S.-S. (2010). Monte-Carlo simulation balancing in practice. In H.J.van den Herik, H.Iida and A.Plaat (Eds.), Computers and Games. Lecture Notes in Computer Science (Vol. 6515, pp. 81–92). doi:10.1007/978-3-642-17928-0_8.
12.
Hunter, D.R. (2004). MM algorithms for generalized Bradley–Terry models. The Annals of Statistics, 32(1), 384–406. doi:10.1214/aos/1079120141.
13.
Kocsis, L. & Szepesvári, C. (2006). Bandit based Monte-Carlo planning. In Proceedings of the 17thEuropean Conference on Machine Learning. Lecture Notes in Computer Science (pp. 282–293). Berlin, Heidelberg, Germany: Springer.
14.
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Driessche, G., van den Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T. & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489.
15.
Silver, D. & Tesauro, G. (2009). Monte-Carlo simulation balancing. In Proceedings of the 26thAnnual International Conference on Machine Learning (pp. 945–952).
16.
Tesauro, G. (2001). Comparison training of chess evaluation functions. In J.Fürnkranz and M.Kubat (Eds.), Machines That Learn to Play Games (pp. 117–130). Commack, NY, USA: Nova Science Publishers, Inc.