Abstract
The average reward problem in the traditional Monte-Carlo tree search algorithms troubles the sudden death games for a long time, because of the average reward criterion will reduce the probability of a deterministic result. But this does not bother the non-sudden death games, such as Go, which can focus only on higher win rates rather than higher scores. In this work, we propose the miniMax-Monte-Carlo tree search with depth rewards to Outer-Open Gomoku (a variant of Gomoku) to discover a forced win/lose without any human knowledge, evaluation function, or pre-training. And it can solve not only the average reward problem but also the inaccurate win-rate problem in deep playout simulation in sudden death games. Finally, we propose a new integrated framework named BBQ (Big, Best, Quick win) MCTS for improving the performance of traditional MCTS.
Keywords
Get full access to this article
View all access options for this article.
