Sage Journals: Discover world-class research

Abstract

The dominant paradigm for computer-Go players is Monte-Carlo Tree Search (MCTS). This algorithm builds a search tree by playing many simulated games (playouts). Each playout consists of a sequence of moves within the tree followed by many moves beyond the tree. Moves beyond the tree are generated by a biased random sampling policy. This note presents a dynamic sampling policy that takes advantage of information from previous playouts. The policy makes moves that, in previous playouts, have been successful replies to immediately preceding moves. Experimental results show that the policy provides a large improvement in playing strength.

Get full access to this article

View all access options for this article.

The Last-Good-Reply Policy for Monte-Carlo Go

Abstract

Get full access to this article