Pages

Showing posts with label metagames. Show all posts
Showing posts with label metagames. Show all posts

Saturday, March 17, 2012

Game Theory XII: Randomness

We have seen that sometimes, it is to a player's advantage to remove one of their own options. Is it ever to a player's advantage to take their choice of action out of their own hands?

The following material is flagged Green Level. It is intended to reflect material that the author believes to be a matter of consensus among experts in the field. This belief may be incorrect, however; and as the author is not an expert and does not have an expert fact-checking the article, errors may creep in.
Before we start on this discussion, a vaguely relevant video. Spoilers for a certain VN/anime/manga series, even though it manages to not name many names.

But back to the question. Let's return to the game we started with:



A B
A (-1,1) (1,-1)
B (1,-1) (-1,1)
As you can see, this is a zero-sum game in which your objective is to make a different move from your opponent. But suppose your opponent knows you very well. Suppose your opponent knows you so well that they know what you will do. For instance, we'll say that you're playing this against a computer that has a complete copy of your brainstate loaded into it. No matter what you think, the computer can predict your thoughts. And no matter what you decide, the computer can know what you will do.

So, since you lose if you pick A, and you lose if you pick B, what do you do?

You make your move based on something that cannot be predicted, something that even you cannot predict. You make your move random. Flip a coin, spin a roulette, or roll a die. You play what is called a mixed strategy.

Now, let's put together a general form of this:


A B
A (-A,A) (B,-B)
B (C,-C) (-D,D)
where A,B,C,D>0

So, if we play a mixed strategy, we need to figure out what odds we should give to each move (assuming that your opponent will assign the same odds):

Odds of Playing A Outcome
0%-D
25%-(1/4*1/4)A+(1/4*3/4)B+(3/4*1/4)C-(3/4*3/4)D
=-A/16+3B/16+3B/16-9D/16
50%-(1/2*1/2)A+(1/2*1/2)B+(1/2*1/2)C-(1/2*1/2)D
=-A/4+B/4+C/4-D/4
75%-(3/4*3/4)A+(3/4*1/4)B+(1/4*3/4)C-(1/4*1/4)D
=9A/16+3B/16+3C/16-D/16
100%-A

<<Metagames: The Punishing Prisoner's Dilemma | Game Theory | Metagames: The Instantaneous Punishing Prisoner's Dilemma>> 

Wednesday, December 21, 2011

Game Theory XI: Metagames: The Punishing Prisoner's Dilemma

Now, let us look at another perspective on the Prisoner's Dilemma. Suppose we allow the players to communicate. What happens?
The following material is flagged Green Level. It is intended to reflect material that the author believes to be a matter of consensus among experts in the field. This belief may be incorrect, however; and as the author is not an expert and does not have an expert fact-checking the article, errors may creep in.
Let us suppose that two of the players in the Prisoner's Dilemma make an arrangement. The players decide this: if either defects, that person must make a side payment to the other, equal to the harm inflicted by the defection. While the offer is available, the game looks like this (as again, B>A>F>E and 2A>B+E) and, because A-E>B-A (as proven below), the dominant strategies are highlighted in green:
2A>B+E
2A-E>B
A-E>B-A


Response to offer: Accept

Response to offer: Reject
Offer


Cooperate

Defect
Cooperate (A,A) (A,B-[A-E])
Defect(B-[A-E],A)(F, F)
=(A,A)


Cooperate

Defect
Cooperate (A,A) (E,B)
Defect(B,E)(F, F)
=(F,F)
Do not offer


Cooperate

Defect
Cooperate (A,A) (E,B)
Defect(B,E)(F, F)
=(F,F)


Cooperate

Defect
Cooperate (A,A) (E,B)
Defect(B,E)(F, F)
=(F,F)
As you can see, applying metagame logic here allows for a rationale for players to cooperate: deterrence. Because a side payment is required for defection, each player benefits more from cooperation than from defection.
Let us examine what happens with pure punishment (no payment is made to someone who is defected against; instead, the defector simply loses the utility in question), with punishments of severity C and D, where C>B-A and D>F-E:


Response to offer: Accept

Response to offer: Reject
Offer


Cooperate

Defect
Cooperate (A,A) (E,B-C)
Defect(B-C,E)(F-D, F-D)
=(A,A)


Cooperate

Defect
Cooperate (A,A) (E,B)
Defect(B,E)(F, F)
=(F,F)
Do not offer


Cooperate

Defect
Cooperate (A,A) (E,B)
Defect(B,E)(F, F)
=(F,F)


Cooperate

Defect
Cooperate (A,A) (E,B)
Defect(B,E)(F, F)
=(F,F)
So, even with no reparations made (or, depending on how one looks at it, no protection against the selfishness of others), each person's interests are served by taking this approach.
Canny readers may note that this is similar to the Hobbesian view of government: that without it, each person would defect (resulting in life being "nasty, brutish, and short" as each person pursues their own gain at the expense of everyone else), but the establishment of a justice system (preferably one as draconian as possible) results in each person cooperating out of self-interest. There are a few problems with this perspective, some of which I will cover here:
  1. As a general rule, people who break laws do not do so rationally, or at least with the expectation of being caught. If that were the case, there would be no crime in societies with harsh justice systems and near-universal surveillance.
  2. The game only takes place over one turn, or with each turn uninfluenced by previous turns. In the real world, future games are affected by previous ones and the benefits of cooperation especially are impacted by punishment. This will be covered in a future installment of the Topic.
  3. The game assumes that justice is perfect: that is, that there is never a false conviction or a false acquittal.
Let us look at the third of these. We will have a probability P(punishment|cooperation), which is the probability of punishment given cooperation (that is, the odds that someone cooperating will be falsely convicted) and a probability P(punishment|defection), which is the probability of punishment given defection (that is, the odds that someone defecting will be convicted).
Remember that C and D are expected utilities. In other words, they are the actual severity of the punishment multiplied by the odds that the punishment will be carried out, or SP(punishment|defection), where S is the true severity of the punishment, or more accurately, S(P[punishment|defection]-P[punishment|cooperation]). So, the true severity of the punishment must be C or D divided by P(punishment|defection)-P(punishment|cooperation). So, for pure deterrence to work, a justice system must hand out stricter punishments if it makes more mistakes, and (assuming a strictest possible sentence), there is a theoretical point of inaccuracy at which the system simply cannot function.

Wednesday, December 14, 2011

Game Theory X: Metagames: Extortion

Since it's finals week and I'm busy, I've arranged for the Topic to have a guest lecturer. Well, two of them. They're going to be giving this lecture through an audio feed, as due to circumstances almost entirely within their control, they are unable to be here. Say hello to the Piranha Brothers.
The following material is flagged Green Level. It is intended to reflect material that the author believes to be a matter of consensus among experts in the field. This belief may be incorrect, however; and as the author is not an expert and does not have an expert fact-checking the article, errors may creep in.
Okay, so our topic this week: As we have seen, communication can be helpful. But can it ever be used to cause harm to another player? Are there any conditions under which it would be beneficial to pay for communication to be forbidden?
Dinsdale, Doug, I hope you've read the notes I sent you.
No, I used them for a bogroll. Of course I read them, you daft person!
Of course. And I can see why you wanted us to teach this class. If I understand right, he wants us to talk about the "Other Other Operation".
Right. So, the Other Other Operation. What we would do is, we would pop by someone's place, and say that if they did not pay us, we would beat them up. So, when this sit-you-ay-shun existed, the game looked a bit like this:

Pay Do not pay
Break Face (A-B,-[A+C]) (-B,-C)
Do Not Break Face(A,-A)(0,0)
with all numbers positive, and A<C.
And no, don't ask how I pronounce a table like that.
So, what do those letters mean, Doug? I mean, I get that it's algebra, and that the letters represent numbers, but what do the numbers represent?
Well, A is obvious, innit? A is what we're asking our "client" to pay us. B is the cost to us to break a face, since it's such a pretty face and it pains us to ruin it. And C is, obviously, the cost of having a broken face. It should go without saying that the zeroes are there because if the person doesn't pay and we don't break any faces, nothing's changed.
Okay, I get that. But from what I've read, "Break Face" and "Pay" are both what he calls "dom-in-ay-ted stra-teh-jies". So why would we ever break a face, and why would anyone ever pay us?
That is a good point. Do not break face/do not pay is the Nash equilibrium. So what do you think, Dinsdale? Why would anyone pay us?
Might it have something to do with Tit-For-Tat? If they don't pay, we break their face next time?
No, Dins. Tit-For-Tat only works if there is a "next time". (Although punishment can play a role in it). Think carefully: what do we do before the game happens?
We... oh. So that's it? They pay because we threaten them? But why would we carry out the threat, since it's a dominated strategy?
Well, that gets into what we talk about. Let's say that we can make promises, and that those promises are binding. So we promise that, unless the other person promises to pay us, we will break their face.
That makes the metagame this, with the rational moves highlighted in green:
Promise to pay

Pay
Break Face (A-B,-[A+C])
Do Not Break Face(A,-A)
=(-A, A)
Do not promise to pay

Pay Do not pay
Break Face (A-B,-[A+C]) (-B,-C)
=(-C,B)
=(-A,A)
Wow, Doug, you really do talk pretty. But it's not like we can make promises we can't break.
Right. So let's say there's something that will punish us for breaking our word, by an amount D; and will punish the other person by an amount E. For instance, we'll say loss of reputation or a bunch of legbreakers coming around. The metagame now looks like this; if D>B and E>A, the rational actions are highlighted in green:


Response to Threat:Promise to payResponse to Threat: Do not promise to pay
Make threat

Pay Do not pay
Break Face (A-B,-[A+C]) (-B,-[C+E])
Do Not Break Face(A,-A)(0,-E)
=(A,-A)

Pay Do not pay
Break Face (A-B,-[A+C]) (-B,-C)
Do Not Break Face(A-D,-A)(-D,0)
=(-B,-C)
Do not make threat

Pay Do not pay
Break Face (A-B,-[A+C]) (-B,-C)
Do Not Break Face(A,-A)(0,0)
=(0,0)

Pay Do not pay
Break Face (A-B,-[A+C]) (-B,-C)
Do Not Break Face(A,-A)(0,0)
=(0,0)
=(A,-A)
So, to us, the ability to communicate and make these kinds of promises is worth A, and worth -A to our "client". In other words, our "client" would pay any amount up to A to keep us from being able to make promises. And we would pay any amount up to A to be able to.
Doug, there was something in the notes about "side payments"...
Oh, right. See, in some games, it's possible for one player to make a payment to the other beyond the game itself. In this case, that isn't quite true, since we can change A to whatever we want. It's effectively the same thing, but not quite. The thing is, as long as A<C, we can make A whatever we want it to be. So, as long as the amount we're asking for is less than the harm caused by a broken face, we can ask whatever we like.
<<Metagames: Coordination and Anti-Coordination | Game Theory | Metagames: The Punishing Prisoner's Dilemma>>  

Wednesday, December 7, 2011

Game Theory: Part IX: Coordination and Anti-Coordination

In the last game, we assumed that the players are able to coordinate, in order to avoid a mutual mistake. But we cannot always confer with others before making decisions. And so, we must decide: how much is it worth to confer with someone else?
The following material is flagged Green Level. It is intended to reflect material that the author believes to be a matter of consensus among experts in the field. This belief may be incorrect, however; and as the author is not an expert and does not have an expert fact-checking the article, errors may creep in.
So, let us look at two classic games, the Coordination Game and the Anti-Coordination Game. These are very common; one example of the Coordination Game might be deciding which side of the road to drive on.
The Coordination Game:


X Y
X (A,A) (B,B)
Y(B,B)(A,A)

The Anti-Coordination Game:

X Y
X (B,B) (A,A)
Y(A,A)(B,B)

where in all cases A>B.

So, as you can see, in the coordination game, the players do better if they play the same move. In the anti-coordination game, the players do better if they play different moves. And in neither case is one move objectively better for one player than its alternative. So, how do the players resolve this?
The solution is obvious. The players must arrange ahead of time what move they will make.

(The players gain nothing from deviating from their arrangement, so they gain no benefit from the ability to make binding promises. In effect, any promise either player makes will punish them if they break it. This will be important later.)

Now, let us say that the players cannot get this for free. Maybe the players are playing via the postal service, and need to pay for stamps. How much should the players pay, maximum, for the ability to coordinate their moves? (As mentioned in Utility Functions, we are assuming that there is some conversion function between the payment and what the players are rewarded in. )

Let's look at what will happen if the players cannot coordinate. Since neither move is dominant, and in fact no move offers any advantage over the other, the players have no choice but to act randomly. Each move will be taken 50% of the time. (It occurs to me that perhaps I should explain how to calculate the odds with which you should decide your moves. Later. For now, just accept that the moves should be played with even odds.) So, the value of this game to each player is:

A/2+B/2.

So what is the value of the game where the players are allowed to communicate? Well, that should be obvious. Neither player stands to gain anything by going against their agreement, so the value of that game is A. So, the value of the ability to communicate is, in this case, A-(A/2+B/2)=A-A/2-B/2=A/2-B/2.

<<Metagames: The Battle of the Sexes | Game Theory | Metagames: Extortion>>

Wednesday, November 30, 2011

Game Theory: Part VIII: Metagames: The Battle of the Sexes

Now, on to something a bit more optimistic. In game theory, it is a basic assumption that you are basing your actions on what your opponent can do, and your opponent is basing their actions on what you can do.
It can be to your advantage to change your own options, since this changes your opponent's decision. Does this extend to limiting yourself?
The following material is flagged Green Level. It is intended to reflect material that the author believes to be a matter of consensus among experts in the field. This belief may be incorrect, however; and as the author is not an expert and does not have an expert fact-checking the article, errors may creep in.
Let's look at a classic game. We have a husband and a wife deciding on what that night's entertainment is going to be. Since this was developed by someone living in a culture based heavily in gender stereotypes, the husband wants to go to the fights, and the wife wants to go to the opera. However, each would rather go to their less-preferred entertainment with the other than to their preferred entertainment alone.
So, the game looks like this, with the husband across the top and the wife down the side:



Fights

Opera
Fights (B,A) (D,D)
Opera(C,C)(A,B)
where A>B>C>D.
So, it is in the best interest of each of them to convince the other to go to their preferred entertainment.
But, there is a solution to this. Suppose we offer one player the ability to make themselves unable to go with one of the options. Or, alternately, let's compare the outcome for a player who is unable to that for one who is not.
The classic example in this case (again, heavily based in Victorian gender stereotypes) is that the wife faints at the sight of blood, so let's apply that.
Now then, we will need to set up a metagame: a game that determines the rules of the next game. This, by the way, is something else that gets absolutely everywhere, from law to history to kids' games.
Our metagame looks a bit like this:



Disable


Fights

Opera
Opera(C,C)(A,B)
Do not disable


Fights

Opera
Fights (B,A) (D,D)
Opera(C,C)(A,B)
And now, let's calculate the value of these games. On the Disable side, the husband is the only one left with a choice. He would rather go to the opera with his wife than to the fights alone, so he will pick "Opera". The value of the first option for the wife is A, and the value for the husband is B.
And now, let's look at the "Do not disable" option. We will, for simplicity's sake, assume that the two are able to discuss and coordinate ahead of time (a subject for later discussion), and that they pick each option half of the time. That is, they alternate between going to the fights together or to the opera together, or they flip a coin to decide. The value for each player is therefore (A+B)/2.
So, we conclude that each player stands to benefit from the ability to disable certain options, as long as the other player does not share that ability. (If that were the case, the value for each player would obviously be C.)
For extra credit, go back to the "Chicken" game, and try thinking about it in terms of metagames and disabled options. What happens if one player locks their wheel into a particular position?