Skip to content (access key 's')
Logo of Technion
Logo of CS Department
Logo of CS4People
Events

The Taub Faculty of Computer Science Events and Talks

Restricted Optimism via Posterior Sampling
event speaker icon
Snir Cohen (M.Sc. Thesis Seminar)
event date icon
Wednesday, 25.10.2017, 14:00
event location icon
Taub 601
event speaker icon
Advisor: Prof. S. Mannor
Optimistic methods for solving Reinforcement Learning problems are very popular in the literature. In practice, however, these methods show inferior performance compared to other methods, such as Posterior Sampling. We propose a novel concept of Restricted Optimism to balance the well known exploration vs. exploitation trade-off for finite-horizon MDPs. We harness Posterior Sampling to construct two algorithms in the spirit of our Restricted Optimism principle. We provide theoretical guarantees for them and demonstrate through experiments that there exists a trade-off between the average cumulative regret suffered by the agent and the variance. The agent can influence this trade-off by tuning the level of optimism carried out by our proposed algorithms through a regularization parameter.