The Learning Point‎ > ‎Mathematics‎ > ‎

## Our Game Theory Tutorials- at a glance Extensive GamesBayesian Games : Games with Incomplete InformationRepeated Games

----------------------------xxxxxxx------------------------------

A Tutorial on Repeated Games

## Many interactions occur more than once e.g. “Buyers and Sellers”, “Firms in an economy”, so the question arises is does that “Change the nature of interactions”. For Example There may occur collusion among the firm; there may be altruistic behaviour present in the society etc.

Example: The prisoner’s Dilemma
 1/2 Cooperate Defect Cooperate 3,3 0,5 Defect 5,0 3,3

• One time only: strictly dominant to Defect
• Infinitely repeated: possible to sustain cooperation: cooperate as long as the other player does.
But is it possible to sustain co-operation if the game is played finite number of times?
Analysis: In the last period, regardless of what has happened, the unique best response of each player is to defect. Using Backward Induction the play will not impact the last period, as then it will just be a one-shot game and both will defect. So, game is played as if it is the last period.

## The game being played each period

• Ai the action of player i taken in period t
• at= (a1t, ... , ant) the list of what each player played at t

### History:

a list of what has happened in every period up to the present
• ht= (a1, ... , at) the list of what happened in each of the first t periods
• Ht the set of all possible histories of length t.
• H =U tHt the set of all possible finite histories.
• Si : H→(Ai∆) a strategy of i
• Si (ht) specifies what i will do after a history ht
So, Si (ht) is a choice of a play in period t+1 after observing what happened in all previous periods.

## Strategies

• What to do for every possible history
• Some histories might never happen (they are ``off the equilibrium path’’), but it can still be important to know what would happen if that history occurs.

## Look at the example of Prisoner’s Dilemma

• Suppose the players (C,C), (D,D), (C,C), (D,D), … forever
• We know the stage-game payoffs: 3, 1, 3, 1, …
Overall payoff in a game with “t” repetitions can be represented as
Σx=1..tE[uix]
In games of infinite repetitions there are two ways:
Limit average reward: lim inft→∞(1/t)Σx=1..tE[uix]     e.g.  if payoffs are 3, 1, 3, 1, …, payoff is 2
Future-discounted reward: Σx=1..E[βx-1uix] e.g. if stage payoffs are 3, 1, 3, 1, … and discount factor β=.9, then
payoff is 3 + 1*.9 + 3*.92+ 1*.93+ ...
Beta takes into the account that “present” is more important than “future”.
Definition of Nash Equilibrium though remains unchanged.

## Sub Game Perfect Equilibrium

• After any history ht we can look at how the game will continue
• It is a proper subgame
• Let si( |ht) denote the strategy for the remainder of the game after a history ht
• Sub Game perfect equilibrium: for all i and ht: si( |ht) is a best response to s-i( |ht).

## Consider the Prisoner’s Dilemma example

• Probability p that the game continues next period, probability (1-p) that it ends.
• One equilibrium: defect in every period, regardless of what has happened in the past.
• Another Equilibrium: Cooperate until someone defects, then defect forever after (regardless of further histories)...
• Payoff to cooperate forever if other plays this way:
• 3 + 3 p + 3 p2+ 3 p3 ...
• Starting from some period where have cooperated in the past
• if cooperate in future
3 + 3 p + 3 p2+ 3 p3
• if defect, then
5 + p + p2+ p3 ..
• Gain from defection: 2 today
• Loss from defection : 2p + 2p2+ 2p3 ...
• Cooperate if 2 ≤ 2p/(1-p)
• f p≥1/2, then cooperate is a best response
 Harder to sustain cooperation as one-time gains from deviation increase.Easier to sustain cooperation as probability continue increases

## Grim Trigger’’ Equilibrium

• Sustain cooperation by threatening to resort to permanent defection if cooperation fails
• If the threat is ``credible’’: it is an equilibrium
• If there is a high enough weight on future interactions, then it is possible to sustain behaviours that were not possible in the short run
• Repetition changes the interaction –can respond to others actions, and that provides different incentives.

## This theorem characterizes many Nash Equilibriums rising in indefinite games. Consider any stage (one-shot) game. Any n-1 agents can hold the remaining agent down to his maxmin value in that one-shot game. So they can enforce any strategy profile that guarantees that agent at most that maxminvalue, by threatening to play the minmax strategy forever if he ever rebels.

This is a “grim trigger” strategy.
• Consider any normal form game G=(N, (Si), (ui)), and G’ its infinite repetition with limit average rewards.
• Define a payoff vector r=r1,…, rnto be enforceableif ri>vi, where vi is agent i’s minmaxvalue in G.

1. Define r=r1,…, rn to be feasible if it is the payoff under some mixed strategy with rational weights.
Then:
• If r is an equilibrium payoff vector in G’ then e in enforceable
• If r is enforceable and feasible, then it is an equilibrium payoff vector in G

## Exercises: Solved problems and examples to help you understand the topic and apply the principles

1. Consider the following game to be played 100 times

 Pred/Prey Active Passive Active 1.7,-.8 3,-1 Passive 1.6,-.7 0,0

Which is true about results from backward induction:
a)(Passive, Active) can appear in some period;
b) (Passive, Passive) can appear in some period;
c) (Active, Passive) can appear in some period;
d) Only (Active, Active) appears in each period.

Solution: In a one-shot game, the only (Nash) equilibrium is (Active, Active), and so that must be played in the last period regardless of what comes before it.
Thus, the play in the second to last period has no impact on the last period, and so (Active, Active) will be played.
Following the same logic as in the finitely repeated PD, (Active, Active) is the only possible outcome in all periods.
Thus d) is true.

2. Consider the following game:
 1/2 Cooperate Defect Cooperate 3,3 0,5 Defect 5,0 1,1

Write its H1
Solution:

H1= { (C,C); (C,D); (D,C); (D,D)}

3. Consider the matching-pennies game:
 1/2 Left Right Left 1,-1 -1,1 Right -1,1 1,-1

How many elements are there in H3 ?
Solution: H1 has 4 elements: (L,L) (L,R) (R,L) and (R,R).
Then H2 has 4x4 elements of the form (h1, h2) where h1 and h2 each has 4 possible values (the same as those in H1).
Then H3 has 4x4x4 elements of the form (h1, h2,h3) where each ht has 4 possible values.
Hence the answer is 43 =64

4

Play this game 50 times;Which is true about results from backward induction:
b)Only (Bad, Not) appears in each period.
c)Only (Good, Buy) appears in each period.
d)Only (Good, Not) appears in each period.
Solution: Using the argument from question 1 it is easy to see “b)” is true.

5. Consider the matching-pennies game:
 1/2 Left Middle Right Left 2,-2 -2,2 -2,2 Right -2,2 -2,2 2,-2

How many elements are there in H3
Solution: H1 has 6 elements; H2 will have 6*6 elements, therefore H3 has 63 elements.

6. Consider an indefinitely repeated game such that with probability p the game continues to the next period and with prob(1-p) it ends.
The ``grim trigger’’ strategy is such that a player cooperates as long as the other does and defects forever after if the other player defects.
 1/2 Cooperate Defect Cooperate 4,4 0,5 Defect 5,0 1,1

If the other player uses a grim trigger strategy, what is the total expected payoff from always cooperating?
Solution: If a player always cooperates, then given that the other player uses grim trigger strategy, the other player always cooperates as well. Thus each earns 4 every period, (4+4p+4p2+…) .

7.  In previous question What is the total expected payoff from always playing defect?
Solution:If player always defects, then given that the other uses grim trigger the second player cooperates at period 1 but then defects in period 2 onwards. Thus this player earns 5 in period 1 and 1 in every period from 2 onwards, (5+p+p2+…).

8. Consider the previous question let p* be the threshold such that when p≥p*, cooperation is sustainable as a subgame perfect equilibrium by the ``grim trigger’’ strategy. What is p*?
Solution:
• The continuation payoff from indefinite cooperation is
4+4p+4p2+… = 4 + 4p/(1-p)
• The continuation payoff from defecting (and thus continued defection) is
5+p+p2+… = 5 + p/(1-p)
• In order to sustain cooperation, we need
4 + 4p/(1-p) ≥ 5 + p/(1-p), thus p*=1/4.

9. Consider a repeated game such that with probability p the game continues to the next period and with prob (1-p) it ends. The game starts in period 1. In odd periods both players cooperate and in even periods both defect.
 1/2 Cooperate Defect Cooperate 3,3 0,5 Defect 5,0 1,1

What is the expected total future payoff (starting at the beginning of the game) for each player, when the game is forecast to be played as described as above:

Solution: In odd period each will get a payoff 3 while in even period each gets a payoff 1.
3+p+3p2+p3+…

10. Consider a repeated game such that with probability p the game continues to the next period and with prob (1-p) it ends.
Let p* be the threshold such that when p≥p*, cooperation is sustainable as a subgame perfect equilibrium by the "grim trigger" strategy (under which each player cooperates as long as the other does and defects forever after if either player deviates), and when p<p* it is not sustainable.
 1/2 Cooperate Defect Cooperate 2,2 0,5 Defect 5,0 1,1

Solution:
• The continuation payoff from indefinite cooperation is
2+2p+2p2.......= 2+2p/(1-p)
• The continuation payoff from defecting (and thus continued defection) is
5+p+p2....... = 5+ p/(1-p)
• In order to sustain cooperation, we need
2+2p/(1-p)> 5+ p/(1-p), p>3/4
Hence p*=3/4

11.
 1/2 Cooperate Defect Cooperate 3,3 0,5 Defect 5,0 1,1

Which per period payoff is not both feasible and enforceable:
a) (3, 3);
b) (2.25,2.25)
c) (1,1)
d) (5,0)
Solution: The maximin value of each player is 1.Thus (5, 0) is not enforceable since it gives player 2 an expected lower than her maximin value.

12.
 Cooperate Defect Cooperate 4,4 0,5 Defect 5,0 3,3

Which payoff is both feasible and enforceable:
a) (6,6)
b) (5,2)
c) (2.5,2.5)
d) (4,4)
Solution: (6,6) is not feasible while all other except(4,4) aren’t enforceable as at least 1 player is getting a payoff less than her maxmin value.

13. Consider a scaled down model on workings of OPEC. Countries aim to collude on production, drive up price and profits and return to equilibrium if someone deviates.
let P = 300 -5Q world demand for oil(where Q is total production and qi be the production of country i). Let marginal cost for production be “c” for all countries. Assign a suitable payoff function that gives the profit to each country.
Solution:
• More the value of c less is the profit
• More is the production qi more is the profit
• Also the profits depend on  P.
One such function can be : (P-c)qi

14. Using the payoff function of previous question find static nash equilibrium(where each country tries to maximise its profit) Assume there are 4 countries and c=20.
Solution: Profit= (P-c)qi
=(300 –5(q-i + qi) –c) qi
Setting the derivative 0
300 –5(q-i + 2qi) –c= 0
(300-c)/10 – q-i / 2 = qi
Best response for country i:
qi = (300-c)/10 – q-i / 2
Look for a symmetric solution (all q equal)
5q=(300-c)/5; q=11.2
P= 300-5Q
=76
Thus profits to each country:  (76-20)11.2 = 627.2

15. Now suppose countries try to enforce a “Grim Trigger” strategy by keeping qi =7 each.unless someone deviates .If deviation, go to q=11.2 each forever. Is it enforceable and feasible?
Solution:
In that case P= 160
Profits are (160-20) 7 = 980 which is greater than one obtained by Nash Equilibrium.

16. Suppose in the previous question a country deviates in the following fashion:
qi=28 –21/2 = 17.5 , thus earning a profit of 1531.25. Calculate the optimal “β”
Solution: Optimal deviation
Profit: 1531 + 667 β/ (1 –β)
Stick to equilibrium : Profit: 980 + 980 β/ (1 –β)
These are equal if β= 551/864 = .64

17. Consider the following game:
the per period Payoff/Profit for country i: (P-c)qi= (300 –5(q-i+ qi) –20) qi and β is the discount factor. Consider a “grim trigger” threat as part of a strategy: if there is a deviation from the prescribed production, go to producing q=11.2 forever after.
If each produces qi=10, which of the following sentences is wrong:
a)the resulting price is 300-5(40)=100;
b)The profit for each country is is(100-20)10=800 M\$/day;
c)The Nash Equilibrium profit is 627.2M\$/day.
d)Producing q=10 is not sustainable with the grim trigger threat described above for any β.
Solution:
• When cooperating, each country earns 800  with a payoff 800 + 800β/(1-β) .
• When deviating (optimally), a country earns 845 for one day, and 627.2 /day ever after with a payoff 845 + 667β/(1-β) .
• In order to sustain q=10, it must be that 800 + 800β/(1-β) is at least as large as 845 + 667β/(1-β), which holds for β close to 1.
Hence (d) is wrong.

18. Consider the following variation of prisoner’s dilemma

 1/2 Cooperate Defect Cooperate 3,3 0,10 Defect 10,0 1,1

Game is played on the following lines:
in even periods play (D,C) and in odd play (C,D) -if anyone ever deviates, play (D,D) forever.Calculate minimum value of discount factor β for which players will follow the rules.
Solution:
• If the rules are followed, then the expected payoffs:
0 + βi10 + βi20 + βi310 + βi40 + βi510 + … =  βi10 / (1-βi2)
• If deviation then expected payoff:
1 + βi1 + βi21 + βi31 + βi41 + βi51 + .... = 1 / (1-βi)

10 βi/ (1-βi2) ≥ 1 /(1-βi);    β≥ 1/9

19. Consider strategies such that in odd periods (D, C) is played and in even periods (C, D) is played. If there is any deviation then play (D, D) forever after.
 Cooperate Defect Cooperate 3,3 0,8 Defect 8,0 2,2

What is p threshold such that when p≥p*, the taking turns strategy combination is sustainable by grim trigger threat described above, and when p<p* taking turns is not sustainable?

Solution:  Expected payoff when the given strategy is played:
8 + βi0 + βi28 + βi30 + βi48 + βi50 + … = 8+βi28 / (1-βi2)
Expected payoff with deviation:
3+  βi2 + βi22+ βi32 + βi42 + βi52+....= 3+  βi2 / (1-βi)

20.  In an infinitely repeated Prisoner’s Dilemma, a version of what is known as a “tit for tat” strategy of a player i is described as follows:
• There are two "statuses" that player i might be in during any period: "normal" and "revenge";
• In a normal status player i cooperates;
• In a revenge status player i defects;
• From a normal status, player i switches to the revenge status in the next period only if the other player defects in this period;
• From a revenge status player i automatically switches back to the normal status in the next period regardless of the other player’s action in this period.
Consider an indefinitely repeated game so that with probability p that the game continues to the next period and with probability (1-p) it ends.
 Cooperate Defect Cooperate 4,4 0,5 Defect 5,0 1,1

What is the payoff for player 2 from always cooperating when player 1 uses this tit for tat strategy and begins in a normal status? How about always defecting when 1 begins in a normal status?
Solution:
• If 2 always cooperates, then 1 stays 'normal' and cooperates always as well, and the payoff to each player is 4 in each period.

• If 2 always defects, then 1 is normal in odd periods and switches to revenge in even periods (because 2 defects). 1 cooperates in odd periods and defects in even periods, thus 2 earns 5 in odd periods and 1 in even periods.

21. In previous question What is the threshold p* such that when p≥p* always cooperating by player 2 is a best response to player 1 playing tit for tat and starting in a normal status, but when p<p* always cooperating is not a best response?
Solution: From the last question, in order to sustain cooperation, we need 4+4p+4p2+4p3+…;5+p+5p2+p3+… , which is 4+4p = 5+p, thus p = 1/3
p* = 1/3.

22. Consider the following indefinitely repeated Prisoner’s Dilemma .Show  that (tit-for-tat,tit-for-tat)  is a subgame perfect equilibrium for this game with discount factor β iff y-x=1 and β=1/x
 C D C x,x 0,y D y,0 1,1

Solution: Suppose that player 2 adheres to tit-for-tat. Consider player 1’s behavior in subgames
following histories that end in each of the following outcomes.
• (C, C) If player 1 adheres to tit-for-tat the outcome is (C, C) in every period, so that her discounted average payoff in the subgame is x. If she chooses D in the first period of the subgame, then adheres to tit-for-tat, the outcome alternates between (D, C) and (C, D), and her discounted average payoff is y/(1 + β). Thus we need x ≥ y/(1 + β), or β ≥ (y − x)/x, for a one-period deviation from tit-for-tat not to be profitable for player 1.
• (C, D) If player 1 adheres to tit-for-tat the outcome alternates between (D, C) and (C, D), so that her discounted average payoff is y/(1 + β). If she deviates to C in the first period of the subgame, then adheres to tit-for-tat, the outcome is (C, C) in every period, and her discounted average payoff is x. Thus we need y/(1 + β) ≥ x, or β ≤ (y − x)/x, for a one-period deviation from tit-for-tat not to be profitable for player 1.
• (D, C) If player 1 adheres to tit-for-tat the outcome alternates between (C, D) and (D, C), so that her discounted average payoff is δy/(1+δ). If she deviates to D in the first period of the subgame, then adheres to tit-for-tat, the outcome is (D, D) in every period, and her discounted average payoff is 1. Thus we need δy/(1 + β) ≥ 1, or β ≥ 1/(y − 1), for a one-period deviation from tit-for-tat not to be profitable for player 1.
• (D, D) If player 1 adheres to tit-for-tat the outcome is (D, D) in every period, so that her discounted average payoff is 1. If she deviates to C in the first period of the subgame, then adheres to tit-for-tat, the outcome alternates between (C, D) and (D, C), and her discounted average payoff is βy/(1 + β). Thus we need 1 ≥ βy/(1 + β), or β ≤ 1/(y − 1), for a one-period deviation from tit-for-tat not to be profitable for player 1.
The same arguments apply to deviations by player 2, so we conclude that (tit-for-tat, tit-for-tat) is a subgame perfect equilibrium if and only if β = (y − x)/x and β= 1/(y − 1), or y − x = 1 and β= 1/x.

23. The set of feasible payoff profiles of a strategic games is the set of all weighted averages of payoff profiles in the game.The set of feasible payoff pairs in a two player strategic game can be represented graphically. The set of weighted averages of the points (x1,x2) and (y1,y2). Represent the feasible payoff profiles of Prisoner’s Dilemma game
 C D C 2,2 0,3 D 3,0 1,1

Solution: The area enclosed by the points (3,0),(2,2),(0,3)&(1,1) is the set of all feasible payoff profiles

24. Here we define the Bertrand’s version the game Duopoly .
A single good is produced by n firms; each firm can produce qi units of the good at a cost of Ci(qi). A demand function is also defined whose interpretation is that if the good is available at the price p then the total amount demanded is D(p).
Assume that if the firms set different prices then all consumers purchase the good from the firm with the lowest price, which produces enough output to meet this demand. If more than one firm sets the lowest price, all the firms doing so share the demand at that price equally. A firm whose price is not the lowest price receives no demand and produces no output. (Note that a firm does not choose its output strategically; it simply produces enough to satisfy all the demand it faces, given the prices, even if its price is below its unit cost, in which case it makes a loss.
Firm i’s preferences are represented by its profit, equal to piD(pi)/m− Ci(D(pi)/m) if firm i is one of m firms setting the lowest price (m = 1 if firm i’s price pi is lower than every other price), and equal to zero if some firm’s price is lower than pi.
Now each firm’s unit cost is a constant equal to “c”.Let f(p)=(p-c)D(p) for every price p and assume D is such that f is a continuous function and has a single maximiser denoted by pm
Let si  be the strategy of firm i in the indefinitely repeated game of this game that charges  pm   in the first period and subsequently as long as other firm continues to charge pm   and punishes any deviation from it by other firm by choosing the price c for k periods, then reverting back to it. Given any value of β for what values of k is the strategy pair(s1,s2) a Nash Equilibrium of the indefinitely repeated game?
Solution: Suppose that firm i uses the strategy si . If the other firm, j, uses sj , then its
discounted average payoff is
1/2(1− β)[f(pm) +  βf(pm) + · · ] = 1/2f(pm).
If, on the other hand, firm j deviates to a price p then the closer this price is to pm, the higher is j’s profit, because the punishment does not depend on p. Thus by choosing p close enough to pm the firm can obtain a profit as close as it wishes to f(pm) in the period of its deviation. Its profit during its punishment in the following k periods is zero. Once its punishment is complete, it can either revert to pm or deviate once again. If it can profit from deviating initially then it can profit by deviating once its punishment is complete, so its maximal profit from deviating is
(1− β)[f(pm) + βk+1f(pm) + δ2k+2f(pm) + · · · ]= [(1− β)f(pm)]/ 1− βk+1 .
Thus for (s1, s2) to be a Nash equilibrium we need
(1 − β)/(1 − βk+1 ) ≤ 1/2 ,
or
βk+1 − 2β + 1 ≤ 0.

25. Consider the previous game.
Let si be the following strategy for firm i in the infinitely repeated game:
• in the first period charge the price pm
• in every subsequent period charge the lowest of all the prices charged by the other firms in all previous periods.
Is the strategy pair s1,s2 a nash equilibrium for an β <1?
Solution: Suppose that firm i uses the strategy si . If the other firm does so then its discounted average payoff is ½(f(pm)), as in previous question.. If the other firm deviates to some price p with c < p < pm in the first period, and maintains this price subsequently, then it obtains f(p) in the first period and shares f(p) in each subsequent period, so that its discounted average payoff is
(1− β)[f(p) + 1/2 βf(p) + ½ β2f(p) + · · ·] = 1/2 (2− β)f(p).
If p is close to pm then f(p) is close to f(pm) (because f is continuous). In fact, for any β< 1 we have 2 − β > 1, so that we can find p < pm such that (2 − β)f(p) > f(pm). Hence the strategy pair is not a Nash equilibrium of the infinitely repeated game for any value of β

26. In the previous question suppose that firm i can not detect deviation until ki >=1 periods have passed(for i=1 or i=2). Let si be the strategy of firm i that charges p* until a deviation is detected and price c subsequently.Find as a function of β for which strategy (s1,s2) is a subgame perfect equilibrium.
Solution:  The best deviations involve prices slightly less than p*. Such a deviation by
firm i yields a discounted average payoff close to
(1 −β ) [f(p*) + βf(p*) + · · · + βki−1f(p*)] = (1− βki )f(p*),
whereas compliance with the strategy yields the discounted average payoff 1/2f(p*). Thus the strategy pair is a subgame perfect equilibrium for any value of p* if βk1 ≥ 1/2 and βk2 ≥ ½ , and is not a subgame perfect equilibrium for any value of p* if βk1 < 1/2 or βk2 < 1/2 . That is, the most profitable price for which the strategy pair is a subgame perfect equilibrium is pm if βk1 ≥ 1/2 and
βk2 ≥ 1/2 and is c if βk1 < 1/2 or βk2 < ½.

27. In previous question suppose that before the firms start choosing prices they simultaneously choose detection technologies. Each firm i may choose any positive integer ki or ϴ is zero, while the cost of choosing a positive integer is positive and decreasing in the value of the integer. Study the subgame perfect equilibrium of the entire game in which firm i uses either si or the strategy that chooses c in every period regardless of history.
Solution: Denote by k*i the critical value of ki found in previous question (That is,βk*i ≥ 1/2
and βk*i +1 < 1/2 .) If ki > k*i then no change in kj affects the outcome of the price-setting subgame, so j’s best action at the start of the game is θ, in which case i’s best action is the same. Thus in one subgame perfect equilibrium both firms choose
θ at the start of the game, and c regardless of history in the rest of the game.
If ki ≤ k*i then j’s best action is k*j if the cost of choosing k*j is atmost 1/2f(pm). Thus if the cost of choosing k*i is at most ½ f(pm) for each firm then the game has another subgame perfect equilibrium, in which each firm i chooses k*i at the start of the game and the strategy si in the price-setting subgame.

28. In previous question does a promise to beat the price charged by another firm promote or inhibit competition?
Solution: A promise by firm i to beat another firm’s price is an inducement for consumers to inform firm i of deviations by other firms, and thus reduce its detection time. To this extent, such a promise tends to promote collusion.