David Carmel and Shaul Markovitch. *The M* Algorithm: Incorporating Opponent Models Into Adversary Search*. Technical Report CIS9402, Computer Science Department, Technion, 1994.

While human players adjust their playing strategy according to their opponent, computer programs, which are based on the minimax algorithm, use the same playing strategy against a novice as against an expert. This is due to the assumption of minimax that the opponent uses the same strategy as the player. This work studies the problem of opponent modelling in game playing. We recursively define a player as a pair of a strategy and an opponent model, which is also a player. A strategy can be determined by the static evaluation function and the depth of search. M*, an algorithm for searching game-trees using an n-level modelling player that uses such a strategy, is described and analyzed. We demonstrate experimentally the benefit of using an opponent model and show the potential harm caused by the use of an inaccurate model. We then describe an algorithm, M*-epsilon, for using uncertain models when a bound on the model error is known. Pruning in M* is impossible in the general case. We prove a sufficient condition for pruning and present a pruning algorithm, alpha-beta*, that returns the M* value of a tree, searching only necessary subtrees. Finally, we present a method for acquiring a model for an unknown player. First, we describe a learning algorithm that acquires a model of the opponent's depth of search by using its past moves as examples. Then, an algorithm for acquiring a model of the player's strategy, both depth and function, is described and evaluated. Experiments with this algorithm show that when a superset of the set of features used by a fixed opponent is available to the learner, few examples are sufficient for learning a model that agrees almost perfectly with the opponent.

@techreport{Carmel:1994:MAI, Author = {David Carmel and Shaul Markovitch}, Title = {The M* Algorithm: Incorporating Opponent Models Into Adversary Search}, Year = {1994}, Number = {CIS9402}, Type = {Technical Report}, Institution = {Computer Science Department, Technion}, Address = {Haifa, Israel}, Url = {http://www.cs.technion.ac.il/~shaulm/papers/pdf/Carmel-Markovitch-CIS9402.pdf}, Keywords = {Opponent Modeling, Learning in Games, Multi-Agent Systems, Games}, Secondary-keywords = {M*, Mstar, Mstar-epsilon, Pruning, OM Search, Adversary Search}, Abstract = { While human players adjust their playing strategy according to their opponent, computer programs, which are based on the minimax algorithm, use the same playing strategy against a novice as against an expert. This is due to the assumption of minimax that the opponent uses the same strategy as the player. This work studies the problem of opponent modelling in game playing. We recursively define a player as a pair of a strategy and an opponent model, which is also a player. A strategy can be determined by the static evaluation function and the depth of search. M*, an algorithm for searching game-trees using an n-level modelling player that uses such a strategy, is described and analyzed. We demonstrate experimentally the benefit of using an opponent model and show the potential harm caused by the use of an inaccurate model. We then describe an algorithm, M*-epsilon, for using uncertain models when a bound on the model error is known. Pruning in M* is impossible in the general case. We prove a sufficient condition for pruning and present a pruning algorithm, alpha-beta*, that returns the M* value of a tree, searching only necessary subtrees. Finally, we present a method for acquiring a model for an unknown player. First, we describe a learning algorithm that acquires a model of the opponent's depth of search by using its past moves as examples. Then, an algorithm for acquiring a model of the player's strategy, both depth and function, is described and evaluated. Experiments with this algorithm show that when a superset of the set of features used by a fixed opponent is available to the learner, few examples are sufficient for learning a model that agrees almost perfectly with the opponent. } }