In this work we investigate planing in Markov Decision Processes (MDP) whose state space is exponential (or infinite). To describe the MDP, we assume a generative model that receives a state and action, and outputs the next state (according to the distribution given by the MDP).
We show that, given a generative model, one can compute efficiently a near-optimal policy (in time which is independent from the size of the underlying MDP). Our technique can be viewed as a form of sparse sampling in an MDP.
Similarly, we address the question of finding a near-optimal policy from a limited class of policies. We show that for a finite class of policies, it is sufficient that the number of queries to the generative model grows only logarithmically in the number of policies. We also show that for an infinite class of policies, one can use the VC-dimension to bound the number of queries to the generative model. The talk will not assume any prior knowledge about Markov Decision Processes.
This is a joint work with Michael Kearns (AT&T) and Andrew Ng (U. C. Berkeley)