Abstract
The dominant paradigm for computer-Go players is Monte-Carlo Tree Search (MCTS). This algorithm builds a search tree by playing many simulated games (playouts). Each playout consists of a sequence of moves within the tree followed by many moves beyond the tree. Moves beyond the tree are generated by a biased random sampling policy. This note presents a dynamic sampling policy that takes advantage of information from previous playouts. The policy makes moves that, in previous playouts, have been successful replies to immediately preceding moves. Experimental results show that the policy provides a large improvement in playing strength.
Get full access to this article
View all access options for this article.
