דלג לתוכן (מקש קיצור 's')
אירועים

אירועים והרצאות בפקולטה למדעי המחשב ע"ש הנרי ומרילין טאוב

Offline Meta-RL: Applicable Ambiguity Alleviation
event speaker icon
גל אבינרי (הרצאה סמינריונית למגיסטר)
event date icon
יום חמישי, 14.12.2023, 12:30
event location icon
הרצאת זום: 94960036903 וטאוב 601
event speaker icon
מנחה: Prof. Aviv Tamar, Prof. Shie Mannor
In meta reinforcement learning (meta-RL) an agent seeks an optimal policy when facing a new unseen task that is sampled from a known task distribution. Such a policy leads an effective trade-off between information gathering and reward accumulation. The offline variant of meta-RL (OMRL) presents a challenge to learn such a policy, as previous work established an identifiability problem in OMRL termed MDP ambiguity. This problem relates to the difficulty of learning a neural network that can infer the task at hand at test time. We propose a new method to utilize prior knowledge of the task distribution to mitigate the identifiability problem in OMRL. Additionally, we propose a novel method to evaluate an inference model \textit{offline}, which is more efficient and accurate than the online alternative of policy optimization. Finally, we show that the offline version of the popular VariBAD algorithm can learn a suboptimal representation for task inference, and propose a simple modification that uses contrastive predictive coding to improve its performance. We compare our methods to Offline VariBAD on two ambiguity-prone tasks and demonstrate results that are on par or better than policy replay - a state of the art method for solving MDP ambiguity - while requiring weaker assumptions.