xxxxxxx A Tutorial on Repeated Games Repeated GamesMany interactions occur more than once e.g. “Buyers and Sellers”, “Firms in an economy”, so the question arises is does that “Change the nature of interactions”. For Example There may occur collusion among the firm; there may be altruistic behaviour present in the society etc.Example: The prisoner’s Dilemma
Analysis: In the last period, regardless of what has happened, the unique best response of each player is to defect. Using Backward Induction the play will not impact the last period, as then it will just be a oneshot game and both will defect. So, game is played as if it is the last period. DefinitionsStage Game:The game being played each period
History:a list of what has happened in every period up to the present
Strategies
PayoffsLook at the example of Prisoner’s Dilemma
Σx=1..tE[uix] In games of infinite repetitions there are two ways: Limit average reward: lim inft→∞(1/t)Σx=1..tE[uix] e.g. if payoffs are 3, 1, 3, 1, …, payoff is 2 Futurediscounted reward: Σx=1..∞E[βx1uix] e.g. if stage payoffs are 3, 1, 3, 1, … and discount factor β=.9, then payoff is 3 + 1*.9 + 3*.92+ 1*.93+ ... Beta takes into the account that “present” is more important than “future”. Definition of Nash Equilibrium though remains unchanged. Sub Game Perfect Equilibrium
Indefinite RepetitionConsider the Prisoner’s Dilemma example

Pred/Prey  Active  Passive 
Active  1.7,.8  3,1 
Passive  1.6,.7  0,0 
Which is true about results from backward induction:
a)(Passive, Active) can appear in some period;
b) (Passive, Passive) can appear in some period;
c) (Active, Passive) can appear in some period;
d) Only (Active, Active) appears in each period.
Solution: In a oneshot game, the only (Nash) equilibrium is (Active, Active), and so that must be played in the last period regardless of what comes before it.
Thus, the play in the second to last period has no impact on the last period, and so (Active, Active) will be played.
Following the same logic as in the finitely repeated PD, (Active, Active) is the only possible outcome in all periods.
Thus d) is true.
2. Consider the following game:
1/2  Cooperate  Defect 
Cooperate  3,3  0,5 
Defect  5,0  1,1 
Write its H1
Solution:
H1= { (C,C); (C,D); (D,C); (D,D)}
3. Consider the matchingpennies game:
1/2  Left  Right 
Left  1,1  1,1 
Right  1,1  1,1 
How many elements are there in H3 ?
Solution: H1 has 4 elements: (L,L) (L,R) (R,L) and (R,R).
Then H2 has 4x4 elements of the form (h1, h2) where h1 and h2 each has 4 possible values (the same as those in H1).
Then H3 has 4x4x4 elements of the form (h1, h2,h3) where each ht has 4 possible values.
Hence the answer is 43 =64
4
Seller/Buyer  Buy  Not 
Good  2,2  1,0 
Bad  4,1  0,0 
Play this game 50 times;Which is true about results from backward induction:
a)Only (Bad, Buy) appears in each period.
b)Only (Bad, Not) appears in each period.
c)Only (Good, Buy) appears in each period.
d)Only (Good, Not) appears in each period.
Solution: Using the argument from question 1 it is easy to see “b)” is true.
5. Consider the matchingpennies game:
1/2  Left  Middle  Right 
Left  2,2  2,2  2,2 
Right  2,2  2,2  2,2 
How many elements are there in H3
Solution: H1 has 6 elements; H2 will have 6*6 elements, therefore H3 has 63 elements.
6. Consider an indefinitely repeated game such that with probability p the game continues to the next period and with prob(1p) it ends.
The ``grim trigger’’ strategy is such that a player cooperates as long as the other does and defects forever after if the other player defects.
1/2  Cooperate  Defect 
Cooperate  4,4  0,5 
Defect  5,0  1,1 
If the other player uses a grim trigger strategy, what is the total expected payoff from always cooperating?
Solution: If a player always cooperates, then given that the other player uses grim trigger strategy, the other player always cooperates as well. Thus each earns 4 every period, (4+4p+4p2+…) .
7. In previous question What is the total expected payoff from always playing defect?
Solution:If player always defects, then given that the other uses grim trigger the second player cooperates at period 1 but then defects in period 2 onwards. Thus this player earns 5 in period 1 and 1 in every period from 2 onwards, (5+p+p2+…).
8. Consider the previous question let p* be the threshold such that when p≥p*, cooperation is sustainable as a subgame perfect equilibrium by the ``grim trigger’’ strategy. What is p*?
Solution:
 The continuation payoff from indefinite cooperation is
 The continuation payoff from defecting (and thus continued defection) is
 In order to sustain cooperation, we need
9. Consider a repeated game such that with probability p the game continues to the next period and with prob (1p) it ends. The game starts in period 1. In odd periods both players cooperate and in even periods both defect.
1/2  Cooperate  Defect 
Cooperate  3,3  0,5 
Defect  5,0  1,1 
What is the expected total future payoff (starting at the beginning of the game) for each player, when the game is forecast to be played as described as above:
Solution: In odd period each will get a payoff 3 while in even period each gets a payoff 1.
3+p+3p2+p3+…
10. Consider a repeated game such that with probability p the game continues to the next period and with prob (1p) it ends.
Let p* be the threshold such that when p≥p*, cooperation is sustainable as a subgame perfect equilibrium by the "grim trigger" strategy (under which each player cooperates as long as the other does and defects forever after if either player deviates), and when p<p* it is not sustainable.
1/2  Cooperate  Defect 
Cooperate  2,2  0,5 
Defect  5,0  1,1 
Solution:
 The continuation payoff from indefinite cooperation is
 The continuation payoff from defecting (and thus continued defection) is
 In order to sustain cooperation, we need
Hence p*=3/4
11.
1/2  Cooperate  Defect 
Cooperate  3,3  0,5 
Defect  5,0  1,1 
Which per period payoff is not both feasible and enforceable:
a) (3, 3);
b) (2.25,2.25)
c) (1,1)
d) (5,0)
Solution: The maximin value of each player is 1.Thus (5, 0) is not enforceable since it gives player 2 an expected lower than her maximin value.
12.
Cooperate  Defect  
Cooperate  4,4  0,5 
Defect  5,0  3,3 
Which payoff is both feasible and enforceable:
a) (6,6)
b) (5,2)
c) (2.5,2.5)
d) (4,4)
Solution: (6,6) is not feasible while all other except(4,4) aren’t enforceable as at least 1 player is getting a payoff less than her maxmin value.
13. Consider a scaled down model on workings of OPEC. Countries aim to collude on production, drive up price and profits and return to equilibrium if someone deviates.
let P = 300 5Q world demand for oil(where Q is total production and qi be the production of country i). Let marginal cost for production be “c” for all countries. Assign a suitable payoff function that gives the profit to each country.
Solution:
 More the value of c less is the profit
 More is the production qi more is the profit
 Also the profits depend on P.
14. Using the payoff function of previous question find static nash equilibrium(where each country tries to maximise its profit) Assume there are 4 countries and c=20.
Solution: Profit= (Pc)qi
=(300 –5(qi + qi) –c) qi
Setting the derivative 0
300 –5(qi + 2qi) –c= 0
(300c)/10 – qi / 2 = qi
Best response for country i:
qi = (300c)/10 – qi / 2
Look for a symmetric solution (all q equal)
5q=(300c)/5; q=11.2
P= 3005Q
=76
Thus profits to each country: (7620)11.2 = 627.2
15. Now suppose countries try to enforce a “Grim Trigger” strategy by keeping qi =7 each.unless someone deviates .If deviation, go to q=11.2 each forever. Is it enforceable and feasible?
Solution:
In that case P= 160
Profits are (16020) 7 = 980 which is greater than one obtained by Nash Equilibrium.
16. Suppose in the previous question a country deviates in the following fashion:
qi=28 –21/2 = 17.5 , thus earning a profit of 1531.25. Calculate the optimal “β”
Solution: Optimal deviation
Profit: 1531 + 667 β/ (1 –β)
Stick to equilibrium : Profit: 980 + 980 β/ (1 –β)
These are equal if β= 551/864 = .64
17. Consider the following game:
the per period Payoff/Profit for country i: (Pc)qi= (300 –5(qi+ qi) –20) qi and β is the discount factor. Consider a “grim trigger” threat as part of a strategy: if there is a deviation from the prescribed production, go to producing q=11.2 forever after.
If each produces qi=10, which of the following sentences is wrong:
a)the resulting price is 3005(40)=100;
b)The profit for each country is is(10020)10=800 M$/day;
c)The Nash Equilibrium profit is 627.2M$/day.
d)Producing q=10 is not sustainable with the grim trigger threat described above for any β.
Solution:
 When cooperating, each country earns 800 with a payoff 800 + 800β/(1β) .
 When deviating (optimally), a country earns 845 for one day, and 627.2 /day ever after with a payoff 845 + 667β/(1β) .
 In order to sustain q=10, it must be that 800 + 800β/(1β) is at least as large as 845 + 667β/(1β), which holds for β close to 1.
18. Consider the following variation of prisoner’s dilemma
1/2  Cooperate  Defect 
Cooperate  3,3  0,10 
Defect  10,0  1,1 
Game is played on the following lines:
in even periods play (D,C) and in odd play (C,D) if anyone ever deviates, play (D,D) forever.Calculate minimum value of discount factor β for which players will follow the rules.
Solution:
 If the rules are followed, then the expected payoffs:
 If deviation then expected payoff:
10 βi/ (1βi2) ≥ 1 /(1βi); β≥ 1/9
19. Consider strategies such that in odd periods (D, C) is played and in even periods (C, D) is played. If there is any deviation then play (D, D) forever after.
Cooperate  Defect  
Cooperate  3,3  0,8 
Defect  8,0  2,2 
What is p threshold such that when p≥p*, the taking turns strategy combination is sustainable by grim trigger threat described above, and when p<p* taking turns is not sustainable?
Solution: Expected payoff when the given strategy is played:
8 + βi0 + βi28 + βi30 + βi48 + βi50 + … = 8+βi28 / (1βi2)
Expected payoff with deviation:
3+ βi2 + βi22+ βi32 + βi42 + βi52+....= 3+ βi2 / (1βi)
20. In an infinitely repeated Prisoner’s Dilemma, a version of what is known as a “tit for tat” strategy of a player i is described as follows:
 There are two "statuses" that player i might be in during any period: "normal" and "revenge";
 In a normal status player i cooperates;
 In a revenge status player i defects;
 From a normal status, player i switches to the revenge status in the next period only if the other player defects in this period;
 From a revenge status player i automatically switches back to the normal status in the next period regardless of the other player’s action in this period.
Cooperate  Defect  
Cooperate  4,4  0,5 
Defect  5,0  1,1 
What is the payoff for player 2 from always cooperating when player 1 uses this tit for tat strategy and begins in a normal status? How about always defecting when 1 begins in a normal status?
Solution:
 If 2 always cooperates, then 1 stays 'normal' and cooperates always as well, and the payoff to each player is 4 in each period.
 If 2 always defects, then 1 is normal in odd periods and switches to revenge in even periods (because 2 defects). 1 cooperates in odd periods and defects in even periods, thus 2 earns 5 in odd periods and 1 in even periods.
21. In previous question What is the threshold p* such that when p≥p* always cooperating by player 2 is a best response to player 1 playing tit for tat and starting in a normal status, but when p<p* always cooperating is not a best response?
Solution: From the last question, in order to sustain cooperation, we need 4+4p+4p2+4p3+…;5+p+5p2+p3+… , which is 4+4p = 5+p, thus p = 1/3
p* = 1/3.
22. Consider the following indefinitely repeated Prisoner’s Dilemma .Show that (titfortat,titfortat) is a subgame perfect equilibrium for this game with discount factor β iff yx=1 and β=1/x
C  D  
C  x,x  0,y 
D  y,0  1,1 
Solution: Suppose that player 2 adheres to titfortat. Consider player 1’s behavior in subgames
following histories that end in each of the following outcomes.
 (C, C) If player 1 adheres to titfortat the outcome is (C, C) in every period, so that her discounted average payoff in the subgame is x. If she chooses D in the first period of the subgame, then adheres to titfortat, the outcome alternates between (D, C) and (C, D), and her discounted average payoff is y/(1 + β). Thus we need x ≥ y/(1 + β), or β ≥ (y − x)/x, for a oneperiod deviation from titfortat not to be profitable for player 1.
 (C, D) If player 1 adheres to titfortat the outcome alternates between (D, C) and (C, D), so that her discounted average payoff is y/(1 + β). If she deviates to C in the first period of the subgame, then adheres to titfortat, the outcome is (C, C) in every period, and her discounted average payoff is x. Thus we need y/(1 + β) ≥ x, or β ≤ (y − x)/x, for a oneperiod deviation from titfortat not to be profitable for player 1.
 (D, C) If player 1 adheres to titfortat the outcome alternates between (C, D) and (D, C), so that her discounted average payoff is δy/(1+δ). If she deviates to D in the first period of the subgame, then adheres to titfortat, the outcome is (D, D) in every period, and her discounted average payoff is 1. Thus we need δy/(1 + β) ≥ 1, or β ≥ 1/(y − 1), for a oneperiod deviation from titfortat not to be profitable for player 1.
 (D, D) If player 1 adheres to titfortat the outcome is (D, D) in every period, so that her discounted average payoff is 1. If she deviates to C in the first period of the subgame, then adheres to titfortat, the outcome alternates between (C, D) and (D, C), and her discounted average payoff is βy/(1 + β). Thus we need 1 ≥ βy/(1 + β), or β ≤ 1/(y − 1), for a oneperiod deviation from titfortat not to be profitable for player 1.
23. The set of feasible payoff profiles of a strategic games is the set of all weighted averages of payoff profiles in the game.The set of feasible payoff pairs in a two player strategic game can be represented graphically. The set of weighted averages of the points (x1,x2) and (y1,y2). Represent the feasible payoff profiles of Prisoner’s Dilemma game
C  D  
C  2,2  0,3 
D  3,0  1,1 
Solution: The area enclosed by the points (3,0),(2,2),(0,3)&(1,1) is the set of all feasible payoff profiles
24. Here we define the Bertrand’s version the game Duopoly .
A single good is produced by n firms; each firm can produce qi units of the good at a cost of Ci(qi). A demand function is also defined whose interpretation is that if the good is available at the price p then the total amount demanded is D(p).
Assume that if the firms set different prices then all consumers purchase the good from the firm with the lowest price, which produces enough output to meet this demand. If more than one firm sets the lowest price, all the firms doing so share the demand at that price equally. A firm whose price is not the lowest price receives no demand and produces no output. (Note that a firm does not choose its output strategically; it simply produces enough to satisfy all the demand it faces, given the prices, even if its price is below its unit cost, in which case it makes a loss.
Firm i’s preferences are represented by its profit, equal to piD(pi)/m− Ci(D(pi)/m) if firm i is one of m firms setting the lowest price (m = 1 if firm i’s price pi is lower than every other price), and equal to zero if some firm’s price is lower than pi.
Now each firm’s unit cost is a constant equal to “c”.Let f(p)=(pc)D(p) for every price p and assume D is such that f is a continuous function and has a single maximiser denoted by pm
Let si be the strategy of firm i in the indefinitely repeated game of this game that charges pm in the first period and subsequently as long as other firm continues to charge pm and punishes any deviation from it by other firm by choosing the price c for k periods, then reverting back to it. Given any value of β for what values of k is the strategy pair(s1,s2) a Nash Equilibrium of the indefinitely repeated game?
Solution: Suppose that firm i uses the strategy si . If the other firm, j, uses sj , then its
discounted average payoff is
1/2(1− β)[f(pm) + βf(pm) + · · ] = 1/2f(pm).
If, on the other hand, firm j deviates to a price p then the closer this price is to pm, the higher is j’s profit, because the punishment does not depend on p. Thus by choosing p close enough to pm the firm can obtain a profit as close as it wishes to f(pm) in the period of its deviation. Its profit during its punishment in the following k periods is zero. Once its punishment is complete, it can either revert to pm or deviate once again. If it can profit from deviating initially then it can profit by deviating once its punishment is complete, so its maximal profit from deviating is
(1− β)[f(pm) + βk+1f(pm) + δ2k+2f(pm) + · · · ]= [(1− β)f(pm)]/ 1− βk+1 .
Thus for (s1, s2) to be a Nash equilibrium we need
(1 − β)/(1 − βk+1 ) ≤ 1/2 ,
or
βk+1 − 2β + 1 ≤ 0.
25. Consider the previous game.
Let si be the following strategy for firm i in the infinitely repeated game:
 in the first period charge the price pm
 in every subsequent period charge the lowest of all the prices charged by the other firms in all previous periods.
Solution: Suppose that firm i uses the strategy si . If the other firm does so then its discounted average payoff is ½(f(pm)), as in previous question.. If the other firm deviates to some price p with c < p < pm in the first period, and maintains this price subsequently, then it obtains f(p) in the first period and shares f(p) in each subsequent period, so that its discounted average payoff is
(1− β)[f(p) + 1/2 βf(p) + ½ β2f(p) + · · ·] = 1/2 (2− β)f(p).
If p is close to pm then f(p) is close to f(pm) (because f is continuous). In fact, for any β< 1 we have 2 − β > 1, so that we can find p < pm such that (2 − β)f(p) > f(pm). Hence the strategy pair is not a Nash equilibrium of the infinitely repeated game for any value of β
26. In the previous question suppose that firm i can not detect deviation until ki >=1 periods have passed(for i=1 or i=2). Let si be the strategy of firm i that charges p* until a deviation is detected and price c subsequently.Find as a function of β for which strategy (s1,s2) is a subgame perfect equilibrium.
Solution: The best deviations involve prices slightly less than p*. Such a deviation by
firm i yields a discounted average payoff close to
(1 −β ) [f(p*) + βf(p*) + · · · + βki−1f(p*)] = (1− βki )f(p*),
whereas compliance with the strategy yields the discounted average payoff 1/2f(p*). Thus the strategy pair is a subgame perfect equilibrium for any value of p* if βk1 ≥ 1/2 and βk2 ≥ ½ , and is not a subgame perfect equilibrium for any value of p* if βk1 < 1/2 or βk2 < 1/2 . That is, the most profitable price for which the strategy pair is a subgame perfect equilibrium is pm if βk1 ≥ 1/2 and
βk2 ≥ 1/2 and is c if βk1 < 1/2 or βk2 < ½.
27. In previous question suppose that before the firms start choosing prices they simultaneously choose detection technologies. Each firm i may choose any positive integer ki or ϴ is zero, while the cost of choosing a positive integer is positive and decreasing in the value of the integer. Study the subgame perfect equilibrium of the entire game in which firm i uses either si or the strategy that chooses c in every period regardless of history.
Solution: Denote by k*i the critical value of ki found in previous question (That is,βk*i ≥ 1/2
and βk*i +1 < 1/2 .) If ki > k*i then no change in kj affects the outcome of the pricesetting subgame, so j’s best action at the start of the game is θ, in which case i’s best action is the same. Thus in one subgame perfect equilibrium both firms choose
θ at the start of the game, and c regardless of history in the rest of the game.
If ki ≤ k*i then j’s best action is k*j if the cost of choosing k*j is atmost 1/2f(pm). Thus if the cost of choosing k*i is at most ½ f(pm) for each firm then the game has another subgame perfect equilibrium, in which each firm i chooses k*i at the start of the game and the strategy si in the pricesetting subgame.
28. In previous question does a promise to beat the price charged by another firm promote or inhibit competition?
Solution: A promise by firm i to beat another firm’s price is an inducement for consumers to inform firm i of deviations by other firms, and thus reduce its detection time. To this extent, such a promise tends to promote collusion.
If ki ≤ k*i then j’s best action is k*j if the cost of choosing k*j is atmost 1/2f(pm). Thus if the cost of choosing k*i is at most ½ f(pm) for each firm then the game has another subgame perfect equilibrium, in which each firm i chooses k*i at the start of the game and the strategy si in the pricesetting subgame.
28. In previous question does a promise to beat the price charged by another firm promote or inhibit competition?
Solution: A promise by firm i to beat another firm’s price is an inducement for consumers to inform firm i of deviations by other firms, and thus reduce its detection time. To this extent, such a promise tends to promote collusion.