Home | Publications | CS Home

Exploration Strategies for Model-based Learning in Multiagent Systems


David Carmel and Shaul Markovitch. Exploration Strategies for Model-based Learning in Multiagent Systems. Autonomous Agents and Multi-agent Systems, 2:141-172, 1999.


Abstract

An agent that interacts with other agents in multi-agent systems can benefit significantly from adapting to the others. When performing active learning, every agentŐs action affects the interaction process in two ways: The effect on the expected reward according to the current knowledge held by the agent, and the effect on the acquired knowledge, and hence, on future rewards expected to be received. The agent must therefore make a tradeoff between the wish to exploit its current knowledge, and the wish to explore other alternatives, to improve its knowledge for better decisions in the future. The goal of this work is to develop exploration strategies for a model-based learning agent to handle its encounters with other agents in a common environment. We first show how to incorporate exploration methods usually used in reinforcement learning into model-based learning. We then demonstrate the risk involved in exploration Ń an exploratory action taken by the agent can yield a better model of the other agent but also carries the risk of putting the agent into a much worse position. We present the lookahead-based exploration strategy that evaluates actions according to their expected utility, their expected contribution to the acquired knowledge, and the risk they carry. Instead of holding one model, the agent maintains a mixed opponent model, a belief distribution over a set of models that reflects its uncertainty about the opponentŐs strategy. Every action is evaluated according to its long run contribution to the expected utility and to the knowledge regarding the opponentŐs strategy. Risky actions are more likely to be detected by considering their expected outcome according to the alternative models of the opponentŐs behavior. We present an efficient algorithm that returns an almost optimal exploration plan against the mixed model and provide a proof of its correctness and an analysis of its complexity. We report experimental results in the Iterated PrisonerŐs Dilemma domain, comparing the capabilities of the different exploration strategies. The experiments demonstrate the superiority of lookahead-based exploration over other exploration methods.


Keywords: Opponent Modeling, Exploration, Active Learning
Secondary Keywords: Lookahead, Exploration vs. Exploitation, Learning DFA
Online version:
Bibtex entry:
 

@article{Carmel:1999:ESM,
  Author =      {David Carmel and Shaul Markovitch},
  Title =       {Exploration Strategies for Model-based Learning in Multiagent Systems},
  Year =        {1999},
  Journal =     {Autonomous Agents and Multi-agent Systems},
  Volume =      {2},
  Number =      {2},
  Pages =       {141--172},
  Keywords =    {Opponent Modeling, Exploration, Active Learning},
  Secondary-keywords =  {Lookahead, Exploration vs. Exploitation, Learning DFA},
  Url =         {http://www.cs.technion.ac.il/~shaulm/papers/pdf/Carmel-Markovitch-aamas1999.pdf},
  Abstract =    {An agent that interacts with other agents in multi-agent systems can benefit
             significantly from adapting to the others. When performing active learning,
             every agentŐs action affects the interaction process in two ways: The effect on
             the expected reward according to the current knowledge held by the agent, and
             the effect on the acquired knowledge, and hence, on future rewards expected to
             be received. The agent must therefore make a tradeoff between the wish to
             exploit its current knowledge, and the wish to explore other alternatives, to
             improve its knowledge for better decisions in the future. The goal of this work
             is to develop exploration strategies for a model-based learning agent to handle
             its encounters with other agents in a common environment. We first show how to
             incorporate exploration methods usually used in reinforcement learning into
             model-based learning. We then demonstrate the risk involved in exploration Ń an
             exploratory action taken by the agent can yield a better model of the other
             agent but also carries the risk of putting the agent into a much worse
             position. We present the lookahead-based exploration strategy that evaluates
             actions according to their expected utility, their expected contribution to the
             acquired knowledge, and the risk they carry. Instead of holding one model, the
             agent maintains a mixed opponent model, a belief distribution over a set of
             models that reflects its uncertainty about the opponentŐs strategy. Every
             action is evaluated according to its long run contribution to the expected
             utility and to the knowledge regarding the opponentŐs strategy. Risky actions
             are more likely to be detected by considering their expected outcome according
             to the alternative models of the opponentŐs behavior. We present an efficient
             algorithm that returns an almost optimal exploration plan against the mixed
             model and provide a proof of its correctness and an analysis of its complexity.
             We report experimental results in the Iterated PrisonerŐs Dilemma domain,
             comparing the capabilities of the different exploration strategies. The
             experiments demonstrate the superiority of lookahead-based exploration over
             other exploration methods.}
}