Hello! Welcome to today’s lesson, in which I’ll show you how to identify winning positions in a game. This will help you to improve the strategy for your artificial intelligence in PyRat.
Winning strategies and positions
But once the game has started, and if the players at some point make different moves and break symmetry, the subsequent game will likely admit winning strategies for one of the players.
In practice, this means that computing winning strategies can only be performed when the variability of the game is reduced, such as when there are just a few pieces of cheese left in the maze.
A vertex in the arena is a winning position for a player if, starting from that vertex, the player has a winning strategy.
Of course, a vertex cannot be winning for both players. If this were the case, it would mean that both players have a strategy that wins against any strategy of the opponent. This would imply that when they both play with their winning strategies they both win, which is not possible. So a vertex can only be winning for one player.
Computing winning strategies
So now let’s see how to compute winning strategies. Ready?
If you’re at a given vertex in the arena, to find the best move, the one that keeps you in the winning region of the arena, you have to take into account all possible subsequent choices of the opponent. In a game like PyRat, the number of such combinations is probably too big to be processed.
A better approach involves storing partial knowledge about winning strategies and positions. This type of approach is called « dynamic programming ».
It’s possible to characterize the state of a game by the position of the rat, the position of the python, the locations of remaining pieces of cheese, and the score of each player.
Remember that the vertices in the arena represent game configurations.
Now we can try to identify other vertices of the arena where one of the players is ensured to win.
You should note that that this way of building a winning strategy can only be computed for a small maze with a limited number of pieces of cheese, and isn’t really feasible for large mazes. When it comes to large mazes, a better approach would involve making assumptions about the opponent’s decisions, which would reduce the combinatorial factors.
This finishes today’s lesson. You’ve learned how winning strategies can be generated, but also that, for larger games and for some computational issues, it might be necessary to take the opponent’s strategy into account.
It’s now up to you to implement all this in your AI in PyRat!
Sadly, this video was the last one in this MOOC. I have really enjoyed exploring the topics of this course with you! Vincent, Nicolas and I hope that you’ve learned many fascinating things on algorithms, and that your AI will be clever enough to beat many opponents.
May the cheese be with you!
The maximin algorithm
In game theory (more precisely in complete information, zero-sum games), the maximin algorithm aims at finding the best move to make at each step, assuming the opponent will play optimally. The goal for the opponent is therefore to minimize our gain, while ours is to maximize it.
Concretely, the main idea of the algorithm is to start from the leaves of the tree of solutions, and to give a score to each of these leaves, corresponding to the gain it provides us.
Then, we will go up the tree until we reach the root, by alternating minimizations of the scores, and maximizations. This alternance models the choices of either the player, or the opponent.
Let us describe what happens on an example, taken from the French Wikipedia page on the Minimax algorithm (minimax is just the symmetric problem of maximin in which we want to minimize some loss function rather than maximizing a gain):
Here, we have a tree of possible executions of a game, in which depths modeled with squares indicate our turn to play, and circles indicate the opponent’s turn.
For each circle node, the opponent will try to minimize our gain, by choosing to perform the action that leads to the minimum gain for us. Therefore, the minimax algorithm will assume the following choices:
What it now means is that, if I reach the state of the game B1, the opponent will play so that our total gain is 3, obtained from state C3. If we make the game go to state B2, our opponent can play so that gain is 5 by enforcing state C4. Finally, if we play to reach situation B3, the opponent can play so that gain is 2 by enforcing state C8.
The optimal choice for us is therefore to maximize the gain, and to play so that we reach state B2. Making this choice, the best gain one can expect when facing an optimal opponent is 5:
The minimax algorithm consists in alternating these steps of minimizations and maximizations, in order to maximize the gain we would ultimately expect when making a decision.
It is important to notice that the maximin algorithm will not return the sequence of plays that lead to the highest achievable gain (which in the example above would be C9 with a gain of 12), but the least worst sequence of play, i.e., the one that realistically leads to our best gain.
To conclude, notice that here we alternate decisions between players, which implies they take their decisions in turn. Still, there exist equivalent results for simultaneous games such as PyRat.