Home | Publications | CS Home

How to explore your opponent's strategy (almost) optimally


David Carmel and Shaul Markovitch. How to explore your opponent's strategy (almost) optimally. In Proceedings of the Third International Conference on Multi-Agent Systems, pages 64-71, Paris, France, 1998.


Abstract

This work presents a lookahead-based exploration strategy for a model-based learning agent that enables exploration of the opponent's strategy during interaction in a multi-agent system. Instead of holding one model, the model-based agent maintains a mixed opponent model, a distribution over a set of models that reflects its uncertainty about the opponent's strategy. Every action is evaluated according to its long run contribution to the expected utility and to the knowledge regarding the opponent's strategy. We present an efficient algorithm that returns an almost optimal exploration strategy against a given mixed model, and a learning method for acquiring a mixed model consistent with the opponent's past behavior. We report experimental results in the Iterated Prisoner's Dilemma game that demonstrate the superiority of the lookahead-based exploration strategy over other exploration methods.


Keywords: Opponent Modeling, Exploration, Active Learning, Learning in Games
Secondary Keywords: Lookahead, Exploration vs. Exploitation, Learning DFA
Online version:
Bibtex entry:
 

@inproceedings{Carmel:1998:HEY,
  Author =      {David Carmel and Shaul Markovitch},
  Title =       {How to explore your opponent's strategy (almost) optimally},
  Year =        {1998},
  Booktitle =   {Proceedings of the Third International Conference on Multi-Agent Systems},
  Pages =       {64--71},
  Keywords =    {Opponent Modeling, Exploration, Active Learning, Learning in Games},
  Secondary-keywords =  {Lookahead, Exploration vs. Exploitation, Learning DFA},
  Address =     {Paris, France},
  Url =         {http://www.cs.technion.ac.il/~shaulm/papers/pdf/Carmel-Markovitch-icmas1998.pdf},
  Abstract =    {This work presents a lookahead-based exploration strategy for a model-based
             learning agent that enables exploration of the opponent's strategy during
             interaction in a multi-agent system. Instead of holding one model, the
             model-based agent maintains a mixed opponent model, a distribution over a set
             of models that reflects its uncertainty about the opponent's strategy. Every
             action is evaluated according to its long run contribution to the expected
             utility and to the knowledge regarding the opponent's strategy. We present an
             efficient algorithm that returns an almost optimal exploration strategy against
             a given mixed model, and a learning method for acquiring a mixed model
             consistent with the opponent's past behavior. We report experimental results in
             the Iterated Prisoner's Dilemma game that demonstrate the superiority of the
             lookahead-based exploration strategy over other exploration methods.}
}