Technical Report MSC-2018-02

Title: Restricted Optimism
Authors: Snir Cohen
Supervisors: Shie Mannor
PDFCurrently accessibly only within the Technion network
Abstract: Markov Decision Processes (MDPs) are commonly used to model Reinforcement Learning problems in which an agent interacts with an unknown environment aiming to maximize a given criterion. In many real-world applications, a decision maker's task is to formulate a strategy for a fixed number of steps in order to achieve a certain objective. For example, in adaptive routing, packets must be sent through a network of routers before the maximum number of hops is reached and sending fails, while reducing the latency to a minimum. These kind of problems are referred to as finite-horizon or fixed-horizon. More examples can be found in portfolio management, power systems management and games.

Optimistic methods for solving Reinforcement Learning problems are very popular in the literature. In practice, however, these methods show inferior performance compared to other methods, such as Posterior Sampling. We propose a novel concept of Restricted Optimism to balance the well known exploration vs. exploitation trade-off for finite-horizon MDPs. We harness Posterior Sampling to construct two algorithms in the spirit of our Restricted Optimism principle. We provide theoretical guarantees for them and demonstrate through experiments that there exists a trade-off between the average cumulative regret suffered by the agent and the variance. The agent can influence this trade-off by tuning the level of optimism carried out by our proposed algorithms through a regularization parameter.

CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (, rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the MSC technical reports of 2018
To the main CS technical reports page

Computer science department, Technion