|Title:||Abstraction for devising compact controllers for MDPs
|Abstract:||The ability to plan a course of action is crucial for intelligent systems. It involves the representation of actions and world models, reasoning about the eects of actions, and techniques for eciently searching the space of possible plans. Planning under uncertainty is captured by decision-theoretic planning (DTP), where the actions have stochastic eects, and the goal is to devise a policy of acting with a high expected utility. This is in contrast to deterministic planning, where a sequence of actions that translates an agent from an initial state to a goal state is sought. The classic mathematical framework for decision-theoretic planning is that of the Markov decision process (MDP). Research on AI planning, reasoning under uncertainty, and decision analysis and operations has given rise to the interesting insight that real-world DTP problems typically exhibit considerable structure. One of the most popular structures is exploited by the use of variable-based representations to describe problems, as is common practice in planning. Such variable based representations allow for compact description of the huge state space, but cast doubt upon the viability of most standard solutions for MDPs, i.e., algorithms which compute the optimal policy assuming an explicit state space. Over the last two decades, some works have presented solutions for MDPs with variable based representation. Factored MDPs (Boutilier et al. ) use implicit variable based state space and a dynamic Bayesian network (DBN) as a compact representation of the transition model. Most works have approximated the solution using an approximate value function with compact representation given by a linear combination of possibly non-linear, basis functions (Bellman et al. 1963 ; Sutton, 1988 ; Tsitsiklis et al. 1997 ). Guestrin et al. 2003  proposed a similar solution, built on the idea of Koller & Parr (1999, 2000) [23, 24] and using factored (linear) functions, where each basis function is restricted to some small subset of the state variables. Dean & Givan 1997  oered a somewhat dierent approach in which the model is compressed much the same way that a nite state machine is reduced to its equivalent minimal nal state machine. While previous works on factored MDPs approximate the value function of the original MDP, in this work we explore a dierent approach of calculating the exact value function of an approximated MDP. We exploit and extend a technique known as over-approximating abstractions to approximately solve the exponential state space MDP problem. An abstraction can, in general, be seen as a mapping that reduces the size of the state space by compacting several states into one. If the abstract state space is made small enough, the standard solutions for explicit state space become feasible for it as well. Over-approximating abstractions, adapted to the context of deterministic planning by Helmert, Haslum & Homann , are based on the merge-and-shrink methodology introduced by Drager, Finkbeiner & Podelski . We depart from these methodologies, adapting the merge-and-shrink abstraction technique to devise compact controllers for MDPs. We introduce the notion of action abstractions to extend the merge-and-shrink abstraction technique for MDPs and for deterministic planning. Finally, we provide a clear testbed evaluation for our methods and compare them to other state-of-the-art approaches.|
|Copyright||The above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information|
Remark: Any link to this technical report should be to this page (http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-info.cgi/2011/MSC/MSC-2011-21), rather than to the URL of the PDF files directly. The latter URLs may change without notice.
To the list of the MSC technical reports of 2011
To the main CS technical reports page
Computer science department, Technion