The M* Algorithm: Incorporating Opponent Models Into Adversary Search

David Carmel and Shaul Markovitch. The M* Algorithm: Incorporating Opponent Models Into Adversary Search. Technical Report CIS9402, Computer Science Department, Technion, 1994.

Abstract

While human players adjust their playing strategy according to their opponent, computer programs, which are based on the minimax algorithm, use the same playing strategy against a novice as against an expert. This is due to the assumption of minimax that the opponent uses the same strategy as the player. This work studies the problem of opponent modelling in game playing. We recursively define a player as a pair of a strategy and an opponent model, which is also a player. A strategy can be determined by the static evaluation function and the depth of search. M*, an algorithm for searching game-trees using an n-level modelling player that uses such a strategy, is described and analyzed. We demonstrate experimentally the benefit of using an opponent model and show the potential harm caused by the use of an inaccurate model. We then describe an algorithm, M*-epsilon, for using uncertain models when a bound on the model error is known. Pruning in M* is impossible in the general case. We prove a sufficient condition for pruning and present a pruning algorithm, alpha-beta*, that returns the M* value of a tree, searching only necessary subtrees. Finally, we present a method for acquiring a model for an unknown player. First, we describe a learning algorithm that acquires a model of the opponent's depth of search by using its past moves as examples. Then, an algorithm for acquiring a model of the player's strategy, both depth and function, is described and evaluated. Experiments with this algorithm show that when a superset of the set of features used by a fixed opponent is available to the learner, few examples are sufficient for learning a model that agrees almost perfectly with the opponent.

Keywords: Opponent Modeling, Learning in Games, Multi-Agent Systems, Games

Secondary Keywords:

Online version:

Bibtex entry:

 @techreport{Carmel:1994:MAI,
  Author = {David Carmel and Shaul Markovitch},
  Title = {The M* Algorithm: Incorporating Opponent Models Into Adversary Search},
  Year = {1994},
  Number = {CIS9402},
  Type = {Technical Report},
  Institution = {Computer Science Department, Technion},
  Address = {Haifa, Israel},
  Url = {http://www.cs.technion.ac.il/~shaulm/papers/pdf/Carmel-Markovitch-CIS9402.pdf},
  Keywords = {Opponent Modeling, Learning in Games, Multi-Agent Systems, Games},
  Secondary-keywords = {M*, Mstar, Mstar-epsilon, Pruning, OM Search, Adversary Search},
  Abstract = {
    While human players adjust their playing strategy according to
    their opponent, computer programs, which are based on the minimax
    algorithm, use the same playing strategy against a novice as
    against an expert. This is due to the assumption of minimax that
    the opponent uses the same strategy as the player. This work
    studies the problem of opponent modelling in game playing. We
    recursively define a player as a pair of a strategy and an
    opponent model, which is also a player. A strategy can be
    determined by the static evaluation function and the depth of
    search. M*, an algorithm for searching game-trees using an n-level
    modelling player that uses such a strategy, is described and
    analyzed. We demonstrate experimentally the benefit of using an
    opponent model and show the potential harm caused by the use of an
    inaccurate model. We then describe an algorithm, M*-epsilon, for
    using uncertain models when a bound on the model error is known.
    Pruning in M* is impossible in the general case. We prove a
    sufficient condition for pruning and present a pruning algorithm,
    alpha-beta*, that returns the M* value of a tree, searching only
    necessary subtrees. Finally, we present a method for acquiring a
    model for an unknown player. First, we describe a learning
    algorithm that acquires a model of the opponent's depth of search
    by using its past moves as examples. Then, an algorithm for
    acquiring a model of the player's strategy, both depth and
    function, is described and evaluated. Experiments with this
    algorithm show that when a superset of the set of features used by
    a fixed opponent is available to the learner, few examples are
    sufficient for learning a model that agrees almost perfectly with
    the opponent.
  }

  }