Abstract
We study the problem of online interaction in general decision making problems, where the objective is not only to find optimal strategies, but also to satisfy certain safety guarantees, expressed in terms of costs accrued. In particular, we focus on the online learning problem in which an agent has to find the optimal solution of a linear objective. Moreover, the agent has to satisfy a linear safety constraint at each round. We propose a theoretical framework to address such problems and present BAN-SOLO, a UCB-like algorithm that, in an online interaction with an unknown environment, attains sublinear regret of order
Get full access to this article
View all access options for this article.
