Polygames: Improved zero learning

Abstract

Since DeepMind’s AlphaZero, Zero learning quickly became the state-of-the-art method for many board games. It can be improved using a fully convolutional structure (no fully connected layer). Using such an architecture plus global pooling, we can create bots independent of the board size. The training can be made more robust by keeping track of the best checkpoints during the training and by training against them. Using these features, we release Polygames, our framework for Zero learning, with its library of games and its checkpoints. We won against strong humans at the game of Hex in $19 \times 19$ , including the human player with the best ELO rank on LittleGolem; we incidentally also won against another Zero implementation, which was weaker than humans: in a discussion on LittleGolem, Hex19 was said to be intractable for zero learning. We also won in Havannah with size 8: win against the strongest player, namely Eobllor, with excellent opening moves. We also won several first places at the TAAI 2019 competitions and had positive results against strong bots in various games.

Keywords

Zero learning board games

Get full access to this article

View all access options for this article.

References

Auger, D. & Teytaud, O. (2012). The frontier of decidability in partially observable recursive games. International Journal of Foundations of Computer Science, 23(7), 1439–1450. revised 2011, accepted 2011, in press, https://hal.inria.fr/hal-00710073. doi:10.1142/S0129054112400576.

Bonnet, É., Jamain, F. & Saffidine, A. (2016). On the complexity of connection games. Theor. Comput. Sci., 644, 2–28. doi:10.1016/j.tcs.2016.06.033.

Buffet, O., Lee, C.-S., Lin, W. & Teytaud, O. (2012). Optimistic heuristics for MineSweeper. In International Computer Symposium, Hualien, Taiwan. https://hal.inria.fr/hal-00750577 .

Coulom, R. (2007). Efficient selectivity and backup operators in Monte-Carlo tree search. In Proceedings of the 5th International Conference on Computers and Games. CG’06 (pp. 72–83). Berlin, Heidelberg: Springer.

Emslie, R. (2019). Galvanise zero. https://github.com/richemslie/galvanise_zero.

He, K., Zhang, X., Ren, S. & Sun, J. (2015). Deep residual learning for image recognition. CoRR. abs/1512.03385. http://arxiv.org/abs/1512.03385.

Johansson, H. (2011). Self-learning robots using evolutionary and genetic algorithms.

Kocsis, L. & Szepesvári, C. (2006). Bandit based Monte-Carlo planning. In Machine Learning: ECML 2006 (pp. 282–293). Springer. doi:10.1007/11871842_29.

Lorentz, R. (2015). Early playout termination in MCTS. In

Plaat ,

van den Herik and

Kosters (Eds.), Advances in Computer Games (pp. 12–19). Cham: Springer. doi:10.1007/978-3-319-27992-3_2.

10.

Love, N., Hinrichs, T. & Genesereth, M. (2006). General game playing: Game description language specification.

11.

Marcus, G. (2018). Innateness, AlphaZero, and artificial intelligence. CoRR. abs/1801.05667. http://arxiv.org/abs/1801.05667.

12.

McCloskey, M. & Cohen, N.J. (1989). Catastrophic interference in connectionist networks: The sequential learning problem. In

G.H.

Bower (Ed.), Psychology of Learning and Motivation (Vol. 24, pp. 109–165). Academic Press. http://www.sciencedirect.com/science/article/pii/S0079742108605368 . doi:10.1016/S0079-7421(08)60536-8.

13.

Pascutto, G.-C. (2017). Leela zero. https://github.com/leela-zero/leela-zero.

14.

Pitrat, J. (1968). Realization of a general game-playing program. In Information Processing, Proceedings of IFIP Congress 1968, Edinburgh, UK, 5–10 August 1968, Volume 2 – Hardware, Applications (pp. 1570–1574).

15.

Ronneberger, O., Fischer, P. & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In

Navab ,

Hornegger ,

W.M.

Wells and

A.F.

Frangi (Eds.), Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 (pp. 234–241). Cham.

16.

Shelhamer, E., Long, J. & Darrell, T. (2017). Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 640–651. doi:10.1109/TPAMI.2016.2572683.

17.

Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T. & Hassabis, D. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484–489. doi:10.1038/nature16961.

18.

Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T.P., Simonyan, K. & Hassabis, D. (2017). Mastering chess and shogi by self-play with a general reinforcement learning algorithm. CoRR. abs/1712.01815. http://arxiv.org/abs/1712.01815.

19.

Tian, Y., Ma, J., Gong, Q., Sengupta, S., Chen, Z., Pinkerton, J. & Zitnick, C.L. (2019). ELF OpenGo: An analysis and open reimplementation of AlphaZero. CoRR. abs/1902.04522. http://arxiv.org/abs/1902.04522.

20.

van Rijswijck, J. (2006). Set colouring games. Ph.D. thesis, University of Alberta.

21.

Wu, D.J. (2019). Accelerating self-play learning in Go. CoRR. abs/1902.10565. http://arxiv.org/abs/1902.10565.