Abstract
The article describes A
Our approach is based on a reinforcement-learning algorithm that is risk seeking, since defensive players in Abalone tend to postpone a game endlessly. We show that risk sensitivity allows a successful self-play training. Moreover, we propose a set of features that seem relevant for achieving a rather skilled level of play.
We evaluate our approach using a fixed heuristic opponent as a benchmark. We pit our agents against human players on-line and compare samples of our agents at different times of training.
Get full access to this article
View all access options for this article.
