学位論文要旨



No 124295
著者(漢字) 山森,哲雄
著者(英字)
著者(カナ) ヤマモリ,テツオ
標題(和) 学習過程と選好進化について
標題(洋) Essays on Learning and Preference Evolution in Games
報告番号 124295
報告番号 甲24295
学位授与日 2009.03.11
学位種別 課程博士
学位種類 博士(経済学)
学位記番号 博経第246号
研究科 経済学研究科
専攻 経済理論専攻
論文審査委員 主査: 東京大学 教授 松井,彰彦
 東京大学 教授 神谷,和也
 東京大学 准教授 佐々木,弾
 東京大学 教授 松島,斉
 東京大学 准教授 松村,敏弘
内容要旨 要旨を表示する

In many situations, even if people enter into strategic interactions with others, they follow the simple decision rule, according to which, they react to the environment without elaborating a strategy because their rationality is bounded by various reasons. People cannot always deduce the opponents' actions at a point in time since they often do not know the opponents' preferences or cannot gauge how smart they are. People cannot always calculate the best response since they often do not know all the actions that are available to them or the payoffs of these actions.

Economists have formally modeled the dynamics of human behavior, which describes how people learn to satisfy their preferences under a situation with bounded rationality. The earliest learning process was presented by Cournot's study of duopoly (1838). In his simultaneous best-reply dynamics, at each period, each firm chooses the quantity that maximizes its profit under the assumption that the other firms continue to choose the quantity of the previous period. Brown (1951) introduced a more sophisticated learning process called fictitious play. In this dynamics, at each period, each player predicts that the probability distribution of the opponents' play is the empirical frequency distribution of their past play, and the player simply chooses a best reply to it. A more sophisticated version of fictitious play was also studied by Milgrom and Roberts (1990, 1991).

The main concern addressed in these studies dealing with learning in games is how or whether players can learn to play Nash equilibrium under a particular learning process. In a general class of games, a learning process does not necessarily converge to Nash equilibrium. Therefore, much attention has been devoted to find a class of games in which the learning process converges to some Nash equilibrium. For example, Robinson (1951), Miyasawa (1961), Krishna (1991), and Monderer and Shapley (1996) found several classes of games in which the fictitious play converges to a Nash equilibrium, whereas Shapley (1964) provided a game in which it does not converge.

This thesis studies three types of the dynamics of human behavior. Two of them are learning processes called best-reply dynamics and better-reply dynamics. The other dynamics is a learning process in which the players' preferences are adjusted over time in addition to their strategies. The following situation is common to these dynamics: Finite numbers of myopic players repeatedly play a game in discrete times. Players cannot always change their strategies; instead, each player randomly receives an opportunity to revise his/her strategy at each period. The probability of the revision opportunity is strictly between 0 and 1, and independent over time and across players.

In the best-reply dynamics, if the current strategy of the player with a revision opportunity maximizes his/her payoff given the opponents' strategies, then he/she continues to choose the strategy. Otherwise, he/she switches to one of the best replies to the current strategy profile with equal probability. This dynamics is similar to the best-response dynamics introduced by Gilboa and Matsui (1991). However, there are differences between their dynamics and the one studied in this thesis. They consider a continuum of players and assume that a deterministic fraction of the players always revise their strategies.

Several classes of games have been found to have the global convergence property under the best-reply dynamics: the sequence of the strategy profiles from any initial strategy profile almost surely converges to some Nash equilibrium. Kandori and Rob (1995) showed that the best-reply dynamics globally converges to some Nash equilibrium in every finite, symmetric, strict supermodular game with totally ordered strategy sets. Kukushkin (2004) obtained the same global convergence result in games with additive aggregation.

Chapter 2 introduces the pure Nash equilibrium property (PNEP) as a sufficient condition for the global convergence under the best-reply dynamics. A game has PNEP if there is a pure strategy Nash equilibrium in any game that has been constructed by restricting the strategies of the players in an original game to its subset. Any finite ordinal potential game and any finite supermodular game have the PNEP. We show that any finite, two-player game with the PNEP has the global convergence property under the best-reply dynamics.

It is implicitly assumed that each player is able to calculate the best replies to the current strategy profile in the best-reply dynamics. In many situations, however, a player may be unable to perform this task since he/she often does not know all the actions that are available to him/her or the payoffs of these actions. The following better-reply dynamics is suitable for modeling human behavior under such a situation.

In the better-reply dynamics, each player with a revision opportunity picks up one strategy from his/her strategy set with equal probability and compares his/her current payoff with the payoff that he/she would receive if he/she opted for the new strategy against the current strategy profile. He/she switches to the new strategy if and only if it gives a higher payoff.

Note that the possible sequences of the strategy profiles in the best-reply dynamics are fewer than those in the better-reply dynamics. Therefore, any game that has the global convergence property under the best-reply dynamics also has the same property under the better-reply dynamics.

Chapter 3 focuses on quasi-supermodular games, which include supermodular games and investigates the global convergence under both the better- and best-reply dynamics. We show the following global convergence results: every quasi-supermodular game has the global convergence property under the better-reply dynamics, and every quasi-supermodular game with totally ordered strategy sets has the global convergence property under the best-reply dynamics.

These two results strengthen the result provided by Friedman and Mezzetti (2001), who showed that the better-reply dynamics globally converges to some Nash equilibrium in every supermodular game with totally ordered strategy sets. In the first result, we relax their assumption that strategy sets are totally ordered. In the second result, we maintain the completeness assumption of ordering, but show the global convergence under the best-reply dynamics instead of the better-reply dynamics. Furthermore, we also weaken the condition of supermodularity to quasi-supermodularity in the two results.

The players' preferences, which are given by the payoffs of the underlying game, are invariable through time in the best- and better-reply dynamics. In contrast, Chapter 4 studies the dynamic process in which players learn to behave on the basis of their preferences, which are in turn shaped by natural selection. Now, the payoffs of the underlying game are not the players' preferences, but their fitness. At each period, as in the best-reply dynamics, each player with a revision opportunity plays a best reply to the current strategy profile in terms of his/her preference, which need not match the underlying fitness. After the strategy profile of the next period is determined, some players are randomly selected. If a selected player does not have the highest fitness among the players, his/her preference is replaced by a new one. Such a dynamic process is called the preference evolution or the indirect evolutionary approach.

Studies on the preference evolution in a strategic environment have focused on the linkage between people's fitness and whether or not their preferences can serve as credible commitment devices. For example, even if a player's preference does not match the underlying fitness, when his/her preference is observable and the players are rational, he/she may gain higher fitness than those whose preferences correspond to the underlying fitness by committing to the certain strategy. Ultimately, the offspring who inherits such a preference dominates over the society (e.g., Guth and Yaari (1992), Guth (1995), Bester and Guth (1998)). In contrast, when the population is large and the players' preferences are not observable, the preferences that do not match the underlying fitness are deprived of their capability to make commitments. Consequently, the preferences of the surviving players are consistent with the underlying fitness (e.g., Ok and Vega-Redondo (2001)).

The preference evolution model studied in Chapter 4 shows that preferences have important consequences for the outcomes even if they do not serve as commitment devices. We focus on the underlying game having two actions, where the players' common fitness exhibits the economies of scale. Both the states in which all players choose the same actions are Nash equilibria. We first show that our dynamics globally converges to one of the Nash equilibria of the underlying game. In this sense, the players' preferences are irrelevant to the result of the underlying game. However, we show that if rare mutations are introduced into the process of the preference evolution, the players' preferences may drift without affecting the equilibrium behavior, and these drifts may influence the results of the equilibrium selection in the underlying game.

Note that the process of changing the preferences can be analogized as the process of changing the manager of the firm in a production economy: The manager's preferences may be different from the firm's profits or the owner's preferences; if the firm does not gain the highest profit among all its competitors, then the owner dismisses the current manager and employs a new one. In Chapter 4, our model of the preference evolution is applied in an economic setting.

審査要旨 要旨を表示する

本論文ではBest-reply dynamics とBetter-reply dynamicsという二つの学習過程に加え、プレイヤーの戦略だけでなく選好が時間とともに変化する選好進化過程について研究している。これら三種類の動学過程では、近視眼的な複数のプレイヤーが繰り返し特定のゲームに直面する状況を考えている。また、プレイヤーは毎期戦略を変更することはできないが、ある一定の確率で変更することができると仮定する。プレイヤーが戦略を変更する際、Best-reply dynamicsでは今期の戦略プロファイルに対して最適反応を選択する。一方、Better-reply dynamicsでは(最適とは限らないが)現在よりも望ましい行動を選択する。これら二つの学習過程とは異なり、選好進化過程ではゲームの利得はプレイヤーの選好ではなく適応度を表し、プレイヤーは適応度と一致する選好を必ずしも持つわけではないと想定される。本論文で扱う選好進化過程においては、まずBest-reply dynamicsと同様に、戦略を変更するプレイヤーは今期の戦略プロファイルに対して(選好の意味で)最適反応を選択する。次に、新たな戦略プロファイルによって実現した各プレイヤーの適応度に応じて選好の分布が変化する。

本論文の第2章と第3章ではBest-reply dynamics とBetter-reply dynamicsの大域的収束性、すなわち、戦略プロファイルの列が任意の初期条件からナッシュ均衡に収束するという性質について分析している。一般的なゲームにおいてBest-reply dynamics とBetter-reply dynamicsが大域的にナッシュ均衡へ収束するとは限らない。また、Best-reply dynamicsが大域的収束性を満たせばBetter-reply dynamicsも同様の性質を満たすが、逆は真でないことに注意する。

第2章はBest-reply dynamicsがナッシュ均衡に収束する十分条件として「純粋ナッシュ均衡特性(PNEP)」を提示している。ここで、あるゲームがPNEPを満たすとは、各プレイヤーの戦略集合をその任意の部分集合に制限して構築されたゲームにおいて、純粋戦略ナッシュ均衡が存在することをいう。本論文ではPNEPを備えたどんな有限2人ゲームにおいてもBest-reply dynamicsが大域的にナッシュ均衡に収束することを証明した。

第3章はBest-reply dynamics とBetter-reply dynamicsがナッシュ均衡に収束する十分条件として準スーパーモジュラーゲームに注目している。ここで、準スーパーモジュラーゲームとは戦略的補完性をみたす状況を一般的に定義したゲームであり、クールノーの複占モデル、製品差別化財のベルトラン寡占競争モデル、ダイヤモンドのサーチモデルなど経済学的に重要とされる様々なモデルを含んでいる。本論文では次の二つの結果を証明した。(1)準スーパーモジュラーゲームにおいてBetter-reply dynamicsは大域的にナッシュ均衡へ収束する。また(2)プレイヤーの戦略集合が全順序であるなら、準スーパーモジュラーゲームにおいてBest-reply dynamicsは大域的にナッシュ均衡へ収束する。上記(1)の結果は戦略集合の全順序性を仮定していない。したがって、例えば企業間競争において各企業が製品の価格と性能など二つ以上の次元で競争するモデルを含んでいる。

第4章では選好進化過程による均衡選択の問題を分析している。既存の選好進化過程に関する研究では、選好がコミットメントとして機能するか否かに焦点が当てられてきた。適応度の最大化を目的としない選好であっても、それがコミットメントとして機能する場合には結果的に適応度を最大化する可能性があるものの、他者の選好が観察できないなどの理由で選好がコミットメントとして機能しない場合には、適応度から乖離した選好が当該ゲームの帰結に影響を与えることはないと考えられていた。本論文では二つのナッシュ均衡を持つ調整ゲームに焦点をあて、選好がコミットメントとして機能しなくとも、均衡選択プロセスを通して当該ゲームの帰結に影響を与えうることを証明した。

これらの考察のうち、第2章は高橋悟氏との共同研究、第3章は高橋氏およびKukushkin氏との共同研究である。その研究課程をつぶさに見てきた立場から、2章に関しては、両者の貢献が同等のものであるとみなすことができる。3章は高橋氏との同等の貢献によって論文を執筆し、学術誌に投稿したところ似たような研究をしていたKukushkin氏との共同研究とするようEditorに指示されたため、3人による共著となったという経緯がある。また、その質の高さによって、第2章はEconomics Bulletin, 第3章はInternational Journal of Game Theory, 第4章はJapanese Economic Reviewに掲載ないし掲載予定となっている。

以上の点に鑑み、本論文は博士号を授与するに十分な水準に達していると審査委員の全会一致で判断した。

UTokyo Repositoryリンク