דלג לתוכן (מקש קיצור 's')
Logo of Technion
Logo of CS Department
אירועים

אירועים

Restricted Optimism via Posterior Sampling
event speaker icon
שניר כהן, הרצאה סמינריונית למגיסטר
event date icon
יום רביעי, 25.10.2017, 14:00
event location icon
טאוב 601
Optimistic methods for solving Reinforcement Learning problems are very popular in the literature. In practice, however, these methods show inferior performance compared to other methods, such as Posterior Sampling. We propose a novel concept of Restricted Optimism to balance the well known exploration vs. exploitation trade-off for finite-horizon MDPs. We harness Posterior Sampling to construct two algorithms in the spirit of our Restricted Optimism principle. We provide theoretical guarantees for them and demonstrate through experiments that there exists a trade-off between the average cumulative regret suffered by the agent and the variance. The agent can influence this trade-off by tuning the level of optimism carried out by our proposed algorithms through a regularization parameter.
[בחזרה לאינדקס האירועים]