Strategic Exploration: Preemption and Prioritization

This paper provides a model of strategic exploration in which competing players independently explore a set of alternatives. The model captures a strategic tension between preemption—to covertly explore alternatives that the opponent will explore in the future—and prioritization—to explore the most promising alternatives. When the players are symmetrically capable in exploring, their equilibrium strategies are greedy. With asymmetric capacities, only the weak player’s strategy is greedy, and the strong player captures a disproportionately larger payoff. The weak player conducts extensive instead of intensive exploration—covering as many alternatives as the strong player does but never fully exploring any. Coordination is impossible even when the alternatives are equally promising or when future gains are arbitrarily large. The model features a multiple-player multiple-armed bandit problem and our results help to understand how the strategic tension shapes the equilibrium behaviors and outcomes, e.g., in technology races between superpowers and R&D competitions between firms.


Introduction
This paper studies strategic exploration in which competing players covertly explore a set of alternatives to find good candidates. The strategic tension is between preemption and prioritization. On one hand, each player would like to preempt his opponents by exploring the alternatives before they do. On the other hand, he would also like to prioritize the most promising alternatives given his capacity constraint.
Such a game of strategic exploration arises, for example, in a technology race between superpowers, where the first country to discover the viable technology (such as nuclear weaponry, space technology, and more recently quantum computing) will gain a military, political, or economical advantage. The discovery and implementation of the technology involve exploring many different alternatives and conducting various experiments, not all of which are equally promising and very few will lead to a success. Therefore, each country must strategically allocate its capacity over alternatives and over time. We are interested in the following questions in the strategic exploration game.
• How would the tension between preemption and prioritization shape their strategic behavior and equilibrium dynamics? Should a country focus exclusively on the most promising route before moving on to less promising ones, or simultaneously explore alternatives with different prior likelihoods of successes? Would the countries explore different alternatives to reduce duplicated explorations and alleviate competition?
• How would the players behave when one strong player has the capacity to explore more alternatives over the same period of time? In particular, should the weak player concentrate on fewer alternatives, or cover as much ground as the strong player which would inevitably lead to insufficient exploration of some or all alternatives given the resource constraint?
• How would the asymmetry in players' exploration capacities translate into asymmetry in their chances of discovery? In particular, can an edge in resources lead to a disproportionately large advantage in competition? What is the impact of moving resources from one country to the other (such as the migration of researchers)?
• How would the strategic considerations affect the overall process of discoveries? In particular, would the discovery arrive sooner if exploration capacities are more symmetrically distributed among players, and would a competitor negligible in resources have a negligible impact in the overall process of discovery?
• If the return from the technology grows over time, e.g., a safer version of the technology is available only in future while its premature implementation has a larger chance of hazard, would players slow down their exploration in order to obtain a larger return?
What is the benefit of a hasted exploration or a head start, if any, when sharing a large return with an opponent outweighs monopolizing a small return?
• How would the dynamic behaviors and outcomes differ if more countries enter the competition? In particular, would ex ante symmetric countries necessarily adopt symmetric exploration strategies or gain equal chances of winning?
None of these questions is a priori obvious, but our model will provide clear answers as consequences of the strategic tension between preemption and prioritization. This strategic tension appears in many other dynamic search competitions where similar questions arise.
The alternatives in the model can be product designs, job opportunities, dating partners, etc. Granted, many other factors are at play in each of these applications; our focus is on the dynamic equilibrium exploration process itself, while previous literature has focused on different aspects of such applications. 1 We study the implications of the strategic tension between preemption and prioritization and develop modeling and analytical tools for this class of dynamic games.
We formulate the benchmark model in the simplest form. The unit interval represents the set of alternatives, at most one of which is good (the analysis and results hold verbatim for general spaces of alternatives and multiple identically and independently distributed good alternatives). Two players share a common prior over the likelihood of success of each alternative. In continuous time, they each face a capacity constraint on the set of alternatives to explore per unit time. Whoever finds the good alternative first will receive a reward. The analysis and the main insights remain the same in an environment with gradual learning where the outcome of each alternative arrives stochastically at a rate controlled by resources allocated to the alternative. Unlike the finite case, the continuum of alternatives makes the model tractable by eliminating the aggregate uncertainty of signal arrivals thanks to the law of large numbers. The model thus encapsulates a useful but underexplored multiple-player multiple-armed bandit problem.
The strategic exploration game features a unique and simple equilibrium. Instead of concentrating on the most promising alternatives, the players explore an expanding set of alternatives of different prior likelihoods of successes in such a way that the posterior likelihoods equalize. The strategy preempts the opponent's future explorations to maximize the option value of exploration. It is also greedy in that it prioritizes alternatives of the highest posterior to maximize the myopic value of discovery. This strategy drives a wedge between equilibrium exploration and coordinated exploration that minimizes the time of discovery.
Without preemption concerns, the fastest discovery would be achieved by prioritizing alternatives according to the prior probabilities of success. Our uniqueness result rules out divided or specialized explorations even when the prior distribution is spread out.
The same unique equilibrium obtains when the payoff from the good alternative increases over time. The equilibrium exploration is greedy so the players fail to coordinate to wait and let the payoff increases. Even though successful early exploration destroys the opportunity to obtain a larger payoff later, failed explorations gives the exploring player an informational advantage as he will concentrate future explorations on the remaining alternatives. Even when the payoff increases discontinuously, the advantage always dominates the opportunity cost of early exploration so the players explore at full capacity in a greedy way which leads to a low collective payoff, which highlights the strategic incentives behind a new form of coordination failure.
When a "strong" player (she) has capacity to explore more alternatives per unit time, the weak player (he) still explores alternatives of the highest posterior, but the strong player explores alternatives of unequal posterior so her strategy is no longer greedy. The weak player always covers the same set of alternatives as the strong, but he never explores any alternative with cumulative probability one, even if he could focus on a smaller set of alternatives. In other words, the weak player conducts extensive exploration instead of intensive exploration.
We show that an edge in exploration capacity gives the strong player a disproportionately large payoff advantage due to an endogenous informational advantage. When the exploration capacity is more asymmetric, the good alternative is discovered earlier in the first-order stochastic sense; nevertheless, the preemption incentive continues to play a non-vanishing role in slowing down discovery even when the weak player's capacity is vanishingly small.
We also study the strategic exploration game when more players enter the race. In the unique symmetric equilibrium, the players expand the set of alternatives to explore to equalize the posterior probabilities, just as in the benchmark setting. We show that the good alternative is discovered earlier with more players in the first-order stochastic dominance sense. However, there exist asymmetric equilibria with asymmetric payoffs even though the players are ex ante identical. Instead of competing over the same set of alternatives, the players specialize in different segments of alternatives in different groups.

Related literature
The paper connects several branches of active research. Optimal exploration of an unknown area is a well-known problem in operations research and computer science. 2 This literature has so far neglected game-theoretic aspects of explorations. We do not consider the path dependence intrinsic to exploring physical locations and we assume away switching costs. In our model, the alternatives can be research ideas, scientific experiments, job opportunities, etc., all of which are of economic interest.
Our notion of preemption extends from well-known preemption models, e.g., Fudenberg and Tirole (1985), Hendricks and Wilson (1992), Abreu and Brunnermeier (2003), and Hopenhayn and Squintani (2011). Players make a single irreversible decision in these timing games, while they make such a decision for each alternative in our model. We introduce prioritization among multiple alternatives to the preemption motive, such that the strategic tension gives rise to different equilibrium dynamics. The preemptive equilibrium arises because of the information advantage, in contrast with the timing games.
Fershtman and Rubinstein (1997) study a discrete-time finite-alternative search problem with preemption. As their analysis demonstrates, the discrete problem is intractable beyond a uniform prior. Under the uniform prior, however, prioritization motive does not exist. 3 Matros, Ponomareva, Smirnov, and Wait (2019) consider a discrete-time continuumalternative variant, but assume away the preemption motive, and thus find a qualitatively different equilibrium in pure strategies. 4 Hence, these two papers do not capture the dynamic trade-off between prioritization and preemption. Chatterjee and Evans (2004) embed a two-alternative model of treasure hunting in a dynamic R&D game with Poisson bandits. Klein and Rady (2011) analyze a continuoustime model of a negative correlated bandit, in which one of the two arms contains a prize and two players share a common value instead of competing with each other. Our model features a negative correlated bandit, but again, these two models do not capture the preemptionprioritization trade-off.
The canonical models of strategic experimentation by Keller, Rady, and Cripps (2005) and Bolton and Harris (1999) capture a trade-off between exploration and exploitation in a multiple-player setting and are useful in many economic applications. 5 Most models in this literature feature a one-armed bandit, with one risky alternative and a safe default one, thus abstracting away the rich set of alternatives to explore and precluding interesting learning, search and innovation processes in many applications. By eliminating aggregate uncertainties and facilitating the analysis of randomization, the multiple-player continuum-armed bandit formulated in this paper overcomes some of the analytical difficulties associated with finitearmed bandit problems that are largely intractable even when the arms are independent.
The strategic exploration game can also be viewed as a contest in which players choose what project to explore over time. Existing models in the literature of contests often study effort choices on a given project; see, e.g., Siegel (2009) and Fu, Lu, and Pan (2015) for recent development. 6 Our model also differs from Hotelling's spatial competition models, e.g., Osborne and Pitchik (1986) and Ottaviani and Sørensen (2003), in that the good alternative (or location) is fixed, the topology of the alternatives is payoff-irrelevant, and the players choose which alternatives to explore dynamically.

Model
In this section, we formulate the strategic exploration game.

Setup
Two players explore a continuum of alternatives x ∈ X := [0, 1] at most one of which is good. The prior distribution of the good alternative is given by a bounded and strictly positive density f . Therefore, the prior probability that the good alternative exists is π := ¡ X f (x)dx ∈ (0, 1]. The continuum of alternatives is an idealization of a large discrete set so two alternatives labeled 0.1 and 0.11 may not give similar outcomes. The players explore over time horizon [0, T ] where T ≥ 1 without observing each other's 5 See also Manso (2011) and a large literature on contractual incentive provisions under experimentation. 6 The literature also studies optimal design of effort-maximizing contests, see, e.g., Moldovanu and Sela (2001;2006) and Che and Gale (2003) for static problems and Bimpikis, Ehsani, and Mostagir (2019) and Halac, Kartik, and Liu (2017) for dynamic problems. Optimal design with our model will be an interesting direction for future research.
explorations. With some abuse of notation, we also denote T as the set [0, T ] when no confusion arises. Each player faces a capacity constraint: he can explore up to unit (Lebesgue) measure of alternatives per unit time. The first player to find the good alternative exclusively claims its prize-a payoff of 1-and the two split it equally in the case of simultaneous discovery. In all other cases, their payoffs are normalized to 0. There is no temporal discounting. Once the good alternative is found, the discovery is publicly announced and the game is over (alternatively, we may assume that the discovery is not made public but the prize is taken away from the good alternative once discovered).

Strategy
We define the strategy space to describe how the players explore the alternatives over time.
The strategy space demands formal treatments because the intuition of "exploring one alternative each period" inherited from a discrete problem does not extend to a continuum of alternatives in continuous time. 7 Our definition of strategy space exploits two ideas: the outcome function approach overcomes the indeterminacy of continuous-time strategies and the distributional approach handles the randomization.

Pure Strategy
A pure strategy σ : T × X → {0, 1} specifies whether an alternative is explored by a certain time, i.e., an alternative x ∈ X is explored at or before t ∈ T if σ(t, x) = 1.
7 First, this measure-preserving bijection between continuous time and the continuum of alternatives is not tractable. Second, a desirable definition should apply to an abstract set of alternatives, but the existence of measure-preserving bijections is not always guaranteed, let alone equilibria in such bijections. Third, the strategy should specify the set of alternatives explored at each moment in time. To avoid an uncountable union of measurable sets over time, we specify the outcome function that determines exploration activities.
The four conditions corresponds to intuitive requirements for exploration activities. The initial condition states that none of the alternatives has been explored at the beginning of the game. The monotonicity condition requires that, once an alternative has been explored, it will have been explored in the future as well. The right-continuity property, similar to that of a cumulative distribution function, guarantees that the time at which an alternative x ∈ X is explored τ (x) := min{t : σ(t, x) = 1} is well defined. The measurability condition further requires that τ : X → T is a measurable function and τ −1 (t), the set of alternatives to be explored at time t, is a measurable set. It is the induced map τ −1 that instructs how the player should actually search. 8 Lastly, the capacity constraint describes how quickly a player can explore the space of alternatives. The maximum measure of alternatives explored per unit time is normalized to 1. We identify a strategy σ up to a stationary null set of X. 9

Distributional Strategy
We define distributional strategies to capture randomized explorations. A distributional strategy ρ : T × X → [0, 1] specifies the probability that alternative x ∈ X is explored by 3. Measurability: ρ(t, ·) is measurable for all t ∈ T ; 4. Capacity constraint: The four conditions extend naturally from pure strategies. A distributional strategy ρ reduces to a pure strategy if ρ(t, x) ∈ {0, 1} by comparing Definition 1 and Definition 2.
Unlike the pure strategy, the distributional strategy only specifies the cumulative probability of exploring each alternative but not the path of exploration. 10 We prove a representation theorem (Theorem 8) that shows the outcome-equivalence between distributional strategies and mixtures of pure strategies, and thus provides an indirect instruction to randomized explorations. We relegate the representation theorem, which relies on weak measurability and Gelfand-Pettis integral, to Appendix A.9.

Remark 1. (Interpretations of Randomization)
In addition to the literal randomized explorations, randomized strategies can be interpreted as the uncertainty entertained by the opponent; 11 or alternately the cumulative resource spent on each alternative so far, which we formalize in Section 5.1.

Payoff
We compute the expected payoff of each player in a profile of distributional strategies. We denote −i as player i's opponent. Given a profile of distributional strategies (ρ i , ρ −i ), player (2.1) The first term in (2.1) is player i's expected payoff from discovering the good alternative before his opponent. It reflects the probabilities of three events: alternative x is good with probability f (x), the opponent −i has not explored it yet with probability 1 − ρ −i (t, x), and player i explores that alternative instantaneously with probability d t ρ i (t, x), where the time integral is the Lebesgue-Stieltjes integral with respect to the non-decreasing and right- The second term in (2.1) is player i's expected payoff from simultaneously discovering the good alternative with his opponent. For each x, the set D x ⊂ T is the at most countable set of discontinuity points of both ρ i (·, x) and At alternative x, the probability of simultaneous discovery is 10 Abreu and Gul (2000) and Hendricks, Weiss, and Wilson (1988) use distributions to describe randomizations in continuous-time games. In addition to the one-dimensional distribution of stopping time in prior work, the distributional strategy in this paper features the continuum of alternatives. 11 See, e.g., Aumann (1987) for an exposition. 12 The Lebesgue-Stieltjes measure is obtained from µ ((s, t] The integral over X is well defined as the integrand can be written as the limit of measurable functions.

Equilibration of Preemption and Prioritization
We derive the unique Nash equilibrium of the strategic exploration game. A profile of

Necessity of Randomization
We shall argue that, due to preemption motives, no player can play a pure strategy in a Nash equilibrium. Facing any pure strategy ρ −i (e.g., it can be a strategy that prioritizes alternatives according to their prior densities), player i can stay "one-step ahead" of his opponent by exploring at time t what his opponent will explore at time t + ϵ. When ϵ is close to 0, player i's payoff from this response is close to π and her opponent's payoff is close to 0. Thus, player −i's payoff is 0 in the putative equilibrium. However, the opponent −i can always imitate player i's equilibrium strategy to guarantee strictly positive payoff.
Therefore, both players must randomize in any equilibrium.

Belief Updating
We analyze how the players update their posterior beliefs to characterize equilibria with randomization. With prior density f and the opponent's distributional strategy ρ −i , player i's posterior density that x is a good alternative right after t is We call g −i (t, x) player i's (unnormalized) posterior distribution over X at time t. We use the subscript "−i" because the posterior conditions only on the strategy of player −i.
The posterior does not take player i's own exploration in the past into account.
The initial condition of the distributional strategy in Definition 2 implies that g −i (0, x) = f (x). The monotonicity condition entails that the posterior g −i (t, x) is non-increasing in t for each x ∈ X. Intuitively, as the opponent −i explores more alternatives over time, the posterior distribution is pushed lower and lower.  We observe that the posterior equals the flow payoff when the probability of simultaneous discovery is zero. In that case, the expected payoff reduces to Intuitively, the flow payoff of exploring alternative x at time t is the probability that x is good and the opponent has not explored it yet. A notable sufficient condition for zero probability of simultaneous discovery is that either ρ i or ρ −i is t-continuous.

Leveling Strategy
We shall construct a candidate equilibrium strategy such that the equilibrium posterior for all t ∈ T and x ∈ X. We call a functionḡ : for all t ∈ [0, 1] andḡ(t) = 0 for t > 1. At time t, the posterior achieves its maximumḡ(t) on {x ∈ X : f (x) ≥ḡ(t)} because it is bounded from above by the prior. As the integrand equals to the distributional strategy, Equation (3.4) corresponds to the binding capacity constraint in Definition 2. We then define the leveling strategyρ : T × X → [0, 1] in terms of the leveling function:ρ for all t ∈ T and x ∈ X.
Lemma 1. The leveling functionḡ exists and is unique, absolutely continuous, convex, and strictly decreasing on [0, 1]. The leveling strategyρ is a well-defined distributional strategy.
With abuse of notation, we denote the posterior density induced by the leveling strategy x))f (x) and call it the leveling posterior at t. We reiterate that it is player i's leveling strategyρ that levels player −i's posteriorḡ.
We demonstrate the relationship between the leveling strategy, the prior, and the leveling posterior in Figure 3.3, and illustrate the implementation of exploration over time

Unique Equilibrium
We show that the symmetric leveling strategy profile is the unique Nash equilibrium of the strategic exploration game, and then discuss a number of its economic implications.
The proof is contained in Appendix A.2. The fact that (ρ,ρ) is an equilibrium follows from its construction. Since the leveling strategy is t-continuous, the flow payoff of exploring an alternative equals to the posterior. As the leveling function prioritizes alternatives with the highest posterior at full capacity at all times, it always achieves the highest flow payoffs and thus the maximum expected payoff.
For uniqueness, we note that the leveling strategyρ guarantees payoff π/2 regardless of the opponent's strategy, i.e. both players can guarantee half of total payoff. Therefore, it suffices to find, against each non-leveling strategy, a deviation with payoff above π/2. As shown in Figure 3.2, the leveling strategy explores at full capacity such that the posterior or flow payoff decreases uniformly over time. For any other strategy, there must exist an interval of time over which the posterior declines faster for one set of alternatives and slower for another. One can then modify the leveling strategy to preempt that strategy by prioritizing the former set at the expense of the latter, in the spirit of the "one-step-ahead" strategy in Section 3.1, to achieve a higher payoff.
We continue to remark on several features of the unique equilibrium.

Remark 2. (Greedy Strategy)
Despite dynamic considerations, the equilibrium strategy is greedy in that the player will explore only alternatives x with the highest posterior (or flow payoff)ḡ(t) at each time t. Myopic best responses are determined solely by the posterior beliefs (the prioritization motive), whereas dynamic best responses in addition take into account how quickly posteriors decline (the preemption motive). The leveling posterior of the most promising alternatives declines at the same rate, so the leveling strategy is a best response both myopically and dynamically.

Remark 3. (Distortions in Exploration)
The strategic motives distort in two ways the equilibrium exploration from coordinated exploration, that prioritizes alternatives according to the prior density without preemption. First, the players prioritize the most promising alternatives a posteriori but not a priori. They explore many less promising alternatives before finishing more promising ones. Second, the players preempt each other by duplicating the opponent's exploration in an extreme way. From start to finish, each player only explores alternatives that his opponent could have already explored. Due to the two distortions, the equilibrium discovery time first-order stochastically dominates, i.e. is slower than, the coordinated counterpart. The instantaneous probability of discovery by one player at t equals the highest posteriorḡ(t) because of the greedy strategy. Therefore, the probability that a discovery is made by time t is

Remark 4. (Payoff Sharing Rule) Since the unique equilibrium strategy is t-continuous,
Theorem 1 remains true for arbitrary payoff-sharing in case of simultaneous discovery, and it continues to hold verbatim even if the discoverer enjoys a larger, but not exclusive, share of the prize. We emphasize that Remark 4-6 are not merely robustness checks. They show that the model is parsimonious and captures the essence of the strategic tension in applications of strategic exploration.

Increasing Payoffs
In this section, we show that the strategy profile (ρ,ρ) is the unique Nash equilibrium even if the payoff from a later discovery is larger. In doing so, we demonstrate the payoff component and information component behind the strategic tension between preemption and prioritization.
Increasing payoff over time is a salient feature in many applications. For example, two competing companies work on a drug with a rising price, or two countries compete on a technology that will become safer in the future. Our result shows that even with arbitrarily strong incentives to wait on exploration, the players fail to coordinate and explore greedily as in the case of constant payoff.
We model the increasing payoff by a strictly increasing time preference β : [0, T ] → R + common among the two players. When a player discovers the good alternative at time t, he enjoys payoff β(t) which increases over time. The expected payoff of player i is thus The benchmark model corresponds to the limiting case β ≡ 1. The time preference accommodates not only a continuously increasing payoff studied in preemption games, but also a discontinuous payoff, e.g., the return to discovering a new drug jumps discontinuously when a complementary patent expires.
We show that the players fail to coordinate on slower or delayed explorations to take advantage of the increasing payoff, even when the increment can be arbitrarily large and arrive arbitrarily soon.
Theorem 2. In the strategic exploration game with increasing time preference β, the profile (ρ,ρ) is the unique Nash equilibrium.
The prioritization and preemption motives shape the unique equilibrium. Under an increasing reward, early exploration can result in a payoff effect and an information advantage.
If it leads to a discovery, the player enjoys the current payoff but eliminates the possibility of a later discovery, which can be much more valuable. However, if early exploration fails, the player can preempt his opponent in future explorations by concentrating his capacity on the remaining alternatives. We show that the information advantage dominates the payoff effect so the players cannot coordinate on delayed explorations. Moreover, they reap the full information advantage by prioritizing the most promising alternatives a posteriori. Therefore, the unique equilibrium features the greedy leveling strategy.

Asymmetric Capacities
In this section, we analyze the strategic tension between prioritization and preemption under asymmetric capacity, and uncover a number of new interesting implications veiled in the symmetric problem.
The two players have different capacities: player 1 can explore measure 1 of alternatives per unit time but player 2 can only explore measure α ∈ (0, 1]. Player 1 (the "strong" player, she) is more capable or more resourceful than player 2 (the "weak" player, he) at exploration.
We refer α as player 2's capacity. Thus, player 2's distributional strategy ρ α 2 : T × X → [0, 1] should satisfy the new capacity constraint in addition to the first three conditions in Definition 2.

Unique Equilibrium
We first introduce the equilibrium strategy of the weak player. Consider a distributional x))f (x) decreases uniformly just like the original but at a slower speed. The fractional one, however, no longer levels the posterior, and explores any given alternative only with probability α by t = 1.  The strategy profile (ρ, αρ) and the corresponding posterior densities, at a fixed time. In this example, player 2 is half as capable as player 1, i.e., α = 1/2. Theorem 3. The profile of distributional strategies (ρ, αρ) is the unique Nash equilibrium of the game with asymmetric players. Player 1's equilibrium payoff is 1 − 1 2 α π and player 2's equilibrium payoff is 1 2 απ. Both the proof ideas and the formal proof for Theorem 3 are relegated to Appendix A.4.
We should note that the strong player is able to explore all alternatives before her opponent, and render the weak player's strategy afterwards outcome-irrelevant in terms of discovery.
In applications where exploration continues after a discovery is covertly made (as one interpretation of the model), the equilibrium outcome is uniquely pinned down by (ρ, αρ).
We now elaborate on the economic implications of Theorem 3.

Remark 7. (Extensive vs. Intensive Exploration)
The preemption motive drives the weak player to conduct extensive, but not intensive, explorations. With smaller capacity, the weak player randomize over the same expanding set of alternatives, and thus cannot explore any single one with probability 1 before the strong player explores everything. This results from the preemption motive: if the weak player concentrates on a smaller set of alternatives, the strong player can preempt him by exploring this set more intensively, so they always explore the same set of alternatives in equilibrium.

Remark 8. (Greedy vs. Non-Greedy)
The preemption motive dominates the prioritization motive so the strong player's equilibrium strategy is no longer greedy. As in the symmetric case, the preemption motive drives the two induced posteriors to decline uniformly in the same expanding set of alternatives. However, the weak player who engages in extensive exploration cannot equalize the posterior faced by the strong player, as shown in Figure 4.1. For the strong player, the dynamic motive from preemption overrides the myopic motive of prioritization. Since she plays the leveling strategy, the strong player's equilibrium strategy is leveling but not greedy, while the weak's is greedy but not leveling.

Remark 9. (Disproportionate Payoff Shares)
The strong player enjoys a disproportionately larger share in payoff, (2 − α) : α than in capacity, 1 : α. It is as if the strong player monopolizes fraction 1 − α of the total payoff and then splits the remaining fraction with the weak player. For example, when the strong player is twice as capable α = 1 2 , she enjoys threefold payoff. The excess payoff beyond the capacity share is due to an endogenous informational advantage: because the strong player rules out more alternatives in previous failed explorations, she can use her capacity more effectively by concentrating on the fewer remaining alternatives.

Remark 10. (Impact of Migration of Research Personnel)
We examine the impact of migration of research personnel, which is the main component of exploration capacity in applications such as the technology race between superpowers. Although the number of researchers can be assumed to be relatively stable, researchers migrate from one country to the other. To this end, suppose that the total exploration capacity is 1 and the weak player has θ < 1 2 fraction of the total capacity. By Theorem 3, the weak player's equilibrium payoff share is 1 2 θ 1−θ . The elasticity of his payoff share with respect to his capacity share is 1 1−θ > 2. Thus, migration always has an outsized impact on the weak player. For the strong player, the elasticity of her payoff share with respect to her own capacity share 1 − θ is 1 3(1−θ)−1 , and unit elasticity is attained when her capacity share is 2 3 . Hence, although the strong player always benefits from immigration of researchers, the scale depends on her existing capacity.

Remark 11. (Endogenous Choice of Capacities)
The exact characterization of equilibrium payoffs facilitates the study of capacity investment in an augmented game. Suppose player i can choose a capacity α i at constant marginal cost c i before entering the strategic exploration game, where player 2 faces a higher marginal cost c 2 > c 1 > 0. It follows from Theorem 3 that, for α 1 ≥ α 2 , the two players' equilibrium payoffs from the strategic exploration game (before paying the cost) are (1 − 1 2 α)π and 1 2 απ, respectively, where α := α 2 α 1 . Therefore, the equilibrium capacities (α * 1 , α * 2 ) satisfy the following conditions: It can be shown that α * := Only player 1 earns a strictly positive net profit while player 2 dissipates his return from exploration through capacity investment. In the limit of c 1 = c 2 , both players' net payoffs are zero. It can be shown that the players enjoy positive net payoffs when the cost of investment is strictly convex.

Discovery Time
We investigate how equilibrium exploration depends on the asymmetry in capacity and show that the asymmetry speeds up the process of discovery. We fix the total capacity to 2 (as in the symmetric case), and vary the asymmetry between the two players. Formally, let γ ∈ [1, 2) and consider the strategic exploration game in which the strong player has capacity γ and the weak player has capacity 2 − γ ∈ (0, 1]. We compute the distribution of discovery time in the unique equilibrium under asymmetric capacity. By Theorem 3, the strong player plays ρ 1 (t, x) =ρ(γt, x) which levels the posteriorḡ(γt), and the weak player plays ρ 2 (t, x) = 1−γ γρ (γt, x) which does not level the posterior. For t ≥ 1/γ, the strong player has exhausted all alternatives so the probability of discovery by t is P γ (t) = π. For t ∈ [0, 1/γ], the probability of discovery by t is given by The two terms correspond to a hypothetical sequential exploration. The first term is the probability of discovery by the strong player if she were to level the posterior before the weak player explored anything. The second term is the probability of discovery by the weak player if he were to expend his cumulative capacity (2 − γ)t on the leveled posterior.
Because the strong player always levels the posterior with all her capacity, the remaining capacity of the weak player replicates some of her exploration. As the strong player enjoys a larger share of capacity, there is less duplication and thus a faster discovery time. This property is illustrated in  Remark 12. As the strong player controls almost the total capacity, the duplication effect vanishes but the equilibrium discovery remains discontinuously slower than the coordinated exploration (Figure 4.2) due to the preemption motive. To avoid preemption by the weak player, the strong player must randomize exploration according to the leveling strategy even as γ → 2. The randomization prevents her from prioritizing the most promising alternatives, either a priori or a posteriori, and thus slows down discovery compared to the fastest, coordinated exploration. 13

Extensions and Comparative Statics
Having analyzed the strategic exploration in the simplest setting, we shall work out three settings of interests, assuming symmetric capacities. These results are by no means exhaustive, but rather serve as examples to demonstrate how the strategic analysis of preemption and prioritization extends to study other dynamic aspects of search and learning.

Poisson Learning
In this section, we consider the strategic exploration game where the players allocate resource to the alternatives to learn via a Poisson process. In other words, a player finds out whether an alternative gradually instead of instantaneously as in the benchmark model.
Each player allocates his resource per unit time, normalized to 1, among the set of alternatives in order to find the good alternative. A conclusive signal arrives at a Poisson rate proportional to the flow rate of resource on the alternative, and reveals whether that alternative is good or not. 14 Let r(t, x) be the cumulative amount of resource a player has spent on alternative x ∈ X by time t ∈ T conditional on no signal arrival before t. The probability of signal arrival by time t is thus 1 − e −r(t,x) . If r(·, x) is differentiable in t, the time derivative ∂ t r(t, x) is the arrival rate of the potentially non-stationary Poisson process associated with alternative x.
Analogous to a distributional strategy, a function r : T × X → R + ∪ {∞} is a resource allocation strategy if it satisfies the following four conditions: 1. Initial condition: r(0, x) = 0 for all x ∈ X; 2. Monotonicity and right-continuity: r(·, x) is increasing and right-continuous for all x ∈ X; 3. Measurability: r(t, ·) is measurable for all t ∈ T ; 4. Capacity constraint: The first three conditions extend directly from distributional strategies but we shall elaborate on the capacity constraint. The capacity constraint limits the expected resource expended, which equals the actual expenditure thanks to the law of large number. The actual amount of resource expended on alternative x by time t may not equal r(t, x) which conditions on no signal arrival; in fact, it is stochastic because no more resource will be spent once a signal arrives. Therefore, the expected amount of resource on x by t is where e −q is the probability of no signal arrival given the cumulative resource q. As the arrival of Poisson signals is independent across alternatives, the law of large numbers implies that the actual resource expended across all alternatives equals its expectation. 15 The continuum of alternatives is essential to the straightforward capacity constraint. In contrast, one needs to specify resource allocation for each history of signal arrivals if there are finitely many alternatives only. In that case, the stochastic signal arrivals impart aggregate uncertainty about the set of remaining alternatives. The capacity constraint must restrict the actual resource expended, which no longer equals its expectation, for each realization of signal arrivals.
We derive the unique equilibrium of the strategic exploration game with Poisson learning by noting a one-to-one correspondence between resource allocation strategies and distributional strategies. Given player i's resource allocation strategy r i , the probability of signal arrival from alternative x by time t is ρ i (t, x) := 1 − e −r i (t,x) ∈ [0, 1]. With this one-to-one relationship between r i and ρ i , it is immediate that r i is a resource allocation strategy if and only if ρ i is a distributional strategy that satisfies the four conditions in Definition 2.
Player i's expected payoff from a profile (r i , r −i ) is the same as u i (ρ i , ρ −i ) as defined in Equation (2.1). Therefore, the game with Poisson learning is isomorphic to the main model with instantaneous arrival. Define a resource allocation strategȳ whereρ is the leveling strategy. The following result is immediate from Theorem 1.

Corollary 1.
With Poisson learning, the profile of resource allocation strategy (r,r) is the unique Nash equilibrium.
We note that Remarks 4-6 remain valid here: the equilibrium characterization in resource allocation strategies is invariant to the space of alternatives, payoff-sharing rules, and the multiplicity of good alternatives.

Impact of Prior Beliefs
We study how the prior distribution affects the equilibrium exploration, and show that the good alternative is discovered more quickly if the prior is less evenly distributed.
With the probability of existence of the good alternative π fixed, we vary the evenness of the prior distribution. We first define the pushforward measure which is the distribution of prior density. Denote λ as the Lebesgue measure. For any prior distribution f , let λ • f −1 be the pushforward measure over R + . Note that λ•f −1 (R + ) = λ([0, 1]) = 1, so the pushforward measure is a probability measure. Its expectation is For example, when the good alternative is uniformly distributed over X, f (x) is a constant so λ • f −1 assigns probability 1 to a single point. This is the case where the good alternative is most evenly distributed over X = [0, 1]. We can then capture the evenness of a prior distribution by its pushfoward measure. Figure 5.1 illustrates the partial order of evenness.
Definition 3. Let f 1 and f 2 be two prior distributions. We say that f 2 is more even than The good alternative is discovered more quickly if the prior distribution is less even. The players concentrate their exploration, which increases preemptive duplications but prioritizes more promising alternatives. We show that the prioritization effect dominates.
Theorem 5. If f 2 is more even than f 1 , then the distribution of equilibrium discovery time associated with f 2 first-order stochastically dominates that associated with f 1 , i.e., the good alternative is discovered earlier with f 1 than with f 2 .
The comparative statics is intuitive given our equilibrium characterization but it is not obvious a priori. One may find Theorem 5 unsurprising since it also applies to the coordinated exploration. We would like to point out our result that the players cannot coordinate on less duplicative explorations regardless of the prior distribution. A priori, one might expect the players to specialize on distinct sets of alternatives to speed up equilibrium exploration when the prior distribution is more even. It would be more difficult to coordinate when the prior is less even because of the strengthened incentives to prioritize the most promising alternatives.  We have shown, however, that such coordination is impossible for any prior distribution.
It fails even when the players have arbitrarily strong incentives to coordinate in face of an increasing prize as shown in Theorem 2. By ruling out all possible coordinations, our equilibrium characterization makes it straightfoward to prove the monotonicity in discovery time.

Multiple Players
At last, we analyze the strategic exploration game with more than two players, and derive the unique symmetric equilibrium similar to the leveling equilibrium. We also show the existence of asymmetric equilibria by an example.
The setup generalizes naturally for multiple players. There are n > 2 players each faces the same capacity constraint. Therefore, the set of strategies available to each of them is still given by Definition 2. Denote the distributional strategy of player i by ρ i . The probability that x is searched up to time t by at least one of player i's opponents is then ρ j (t, x)). As in the case with two players, the posterior induced x))f (x). With this notation, the payoff of player i is again given by Equation (2.1).
We consider the symmetric strategy profile such that the posterior g −i is leveling for every player i. More precisely, letḡ : T → [0, sup f ] be the leveling function defined implicitly by for t ∈ [0, 1] andḡ(t) = 0 for t > 1. The proof of existence and uniqueness ofḡ is analogous to the proof of Lemma 1. By Equation (5.1), the leveling strategȳ is a well-defined distributional strategy.
Theorem 6. The profile of distributional strategies (ρ, ...,ρ) is the unique symmetric Nash equilibrium of the game with n players.
Theorem 6 characterizes the unique equilibrium among the class of symmetric equilibria, but does not establish the uniqueness of Nash equilibrium which does not hold for n > 2. We present an example of an asymmetric equilibrium in which symmetric players enjoy unequal equilibrium payoffs.

Example 1.
Take the uniform prior f ≡ 1, T = 1, and n = 5 players. Partition the alternatives X = [0, 1] into two halves, X 1 := [0, 1 2 ) and X 2 := [ 1 2 , 1], and the players into two groups: {1,2,3} and {4,5}. For each player in the first group, the strategy is given as follows: For each player in the second group, the strategy is given as follows: That is, each player in the first group explores uniformly over the left half X 1 until the alternatives are exhausted at t = 1 2 , and then the other half X 2 . Each player in the second group explores in the reverse order. It can be verified that (ρ 1 , ρ 1 , ρ 2 , ρ 2 , ρ 2 ) is a Nash equilibrium of the 5-player game. Since the discovery must occur before t = 1 2 , the equilibrium exploration is different from the one described in Theorem 6 which has a full support T = [0, 1]. Moreover, despite symmetric capacities, the equilibrium payoffs are asymmetric: player 1 and 2 enjoy an expected payoff of 1 4 while player 3, 4, and 5 have an expected payoff of 1 6 .
Multiple equilibria arise because more than one opponents can preempt any given player.
Whenever a player explores an alternative in equilibrium, at least one of his opponents will explore the same alternative to preempt that player. With only two players, the equilibrium exploration is unique because each player faces only one opponent. With more than two players, however, multiple opponents can preempt any given player. For t ∈ 0, 1 2 in Example 1, player 1 is preempted by player 2 on the left half and by all other players on the right half. 16 Other players face similar situations which sustain the non-leveling equilibrium.
In the unique symmetric equilibrium, the good alternative is discovered more quickly as the number of players increases because of the increased total capacity despite the additional duplication.

Theorem 7. In the class of symmetric equilibria, the distribution of discovery time is decreasing in n in the first-order stochastic dominance sense.
The result is rather intuitive; the non-trivial part is the ranking of discovery times in the first-order stochastic dominance, as the distributions of discovery time for all n players share the same support.

A.1 Proof of Lemma 1
Proof. Sinceḡ is a constant function on t > 1, it suffices to prove the lemma on t ∈ [0, 1].
is decreasing in y and strictly so for f (x) > y, which has positive measure for y < sup f . Thus, h is strictly decreasing. In addition, the integrand is continuous, and therefore h is continuous by the dominated convergence theorem.
The convexity of h also follows from that of the integrand.

The function h is continuous and strictly decreasing with h(0) = 1 and h(sup f ) = 0.
Therefore, there exists a unique, continuous, and strictly decreasing functionḡ = h −1 that solves Equation (3.4). Since h is strictly decreasing, its inverseḡ is also convex. The absolute continuity ofḡ follows from its continuity and convexity.
We verify thatρ is a well-defined distributional strategy. It is straightforward to check thatρ satisfies the initial condition. The function (x, y) → 1 − y f (x) 1 {f (x)≥y} is continuous and decreasing in y. Together with the continuity and monotonicity ofḡ, this property implies thatρ is continuous in t and satisfies the monotonicity and right-continuity condition.
The function is also measurable in x and henceρ satisfies the measurability condition.

A.2 Proof of Theorem 1 A.2.1 Verification
We shall show that (ρ,ρ) is a Nash equilibrium. Suppose that player −i plays the leveling strategyρ. The probability of simultaneous discovery is zero since the strategy is t-continuous. By Equation (3.2), the payoff of player i with strategy ρ i is We shall show that u i (ρ i ,ρ) ≤ u i (ρ,ρ) for any strategy ρ i . By construction in Equation (3.5), the integrandḡ(t, x) is bounded from above by the leveling functionḡ(t). Therefore, For x ∈ X, let κ x ∈ ∆(T ) be the Lebesgue-Stieltjes measure induced by ρ i (·, x). Then where the last equality follows from the capacity constraint of the distributional strategy ρ i . Thus ¡ X κ x dx ∈ ∆(T ) is the Lebesgue measure by the Caratheodory extension theorem. Therefore, where the first equality is by the definition of Lebesgue-Stieltjes integration and the second equality follows from Fubini's theorem. Combining (A.2) and (A.4), player i's payoff of playing ρ i is bounded by The payoff of playingρ is where the first equality is due to Equation (3.2), the second equality holds becauseρ(t, x) > 0 only ifḡ(t, x) =ḡ(t), and the third equality follows from Equation (A.4).
Combining (A.5) and (A.6), we have shown that u i (ρ i ,ρ) ≤ u i (ρ,ρ). By symmetry, the profile of leveling strategies (ρ,ρ) is a Nash equilibrium and each player obtains an expected equilibrium payoff

A.2.2 Uniqueness
Overview. The uniqueness follows from three lemmas in this section. Lemma 2 states that, in equilibrium, each player can only search over the set of alternatives, which we call the upper contour set of f , that the leveling strategy randomizes over. Otherwise, he will enjoy payoff lower than π/2 against the leveling strategy.
Lemma 3 is key to Theorem 1. It states that, in equilibrium, the posterior declines fastest on the upper contour set. If instead the posterior declined slower in some subset of the upper contour set over the other during a period of time, the opponent could devise a modified leveling strategy that searches the former in place of the latter just before the period, and vice versa just after the period. The modification generalizes the "one-step ahead" strategy in Section 3.1. The opponent's strategy would then preempt the player's strategy and yield higher payoff than the leveling strategy, which cannot be true in equilibrium in a constant-sum game.
Lemma 3 has two useful implications. Corollary 2 establishes the t-continuity of the equilibrium strategy. If the posterior were discontinuous over some alternatives at some point, the posterior of all alternatives in the upper contour set must all drop discontinuously.
This would violate the capacity constraint. By applying Lemma 3 twice, we then obtain Corollary 3: the decrease in posterior must be equal over the upper contour set.
Lemma 4 computes the equilibrium posterior within the upper contour set. As the de-crease in posterior is the constant across the set of alternatives according to Corollary 3, the posterior is pinned down by the capacity constraint and the initial condition, which is exactly the leveling posterior defined by the leveling strategy.

Proof of the Theorem.
It is sufficient to consider strategies and deviations ρ that satisfy ρ(1, x) = 1 for all x ∈ X.
These strategies explore all alternatives for sure, and do so at full capacity by t = 1. For any strategy ρ i that does not satisfy the restriction, there exist multiple strategies ρ ′ i 's that satisfy it with ρ ′ i ≥ ρ i by exploring more alternatives with higher probability and/or earlier.
The added/expedited explorations by ρ ′ i do not lead to any additional probability of discovery so ρ ′ i does not change the opponent −i's payoff either. Formally, the sum of the changes in payoff equals the change in the probability of discovery by the integration by parts for Lebesgue-Stieltjes integral. Since ρ ′ ≥ ρ, the change in −i's payoff is nonpositive and the change in total probability is nonnegative. Zero change in i's payoff thus implies zero change in −i's payoff. If (ρ i , ρ −i ) is an equilibrium, then so is (ρ ′ i , ρ −i ). The multiplicity of ρ ′ i will then imply the multiplicity of equilibria within the restricted class of strategies. By contraposition, uniqueness within this class of strategy profiles will imply uniqueness for all profiles.
when the arguments are abbreviated. The payoff function can be written as where the inequality follows from an identity of Stieltjes integral In particular, it holds as an equality whenever one of the strategies is t-continuous.
As shown in Equation (A.6) and Equation (A.7), the Nash equilibrium in leveling strategies gives a payoff As the game is constant-sum, the equilibrium strategies in any Nash equilibrium must achieve the above payoff. Proof. The statement for t 0 = 0 follows from the initial condition. Suppose there exists time Then the payoff of player i against a leveling strategy of player −i is strictly below the equilibrium payoff: The second equality is due to Equation (A.4). The weak inequality follows from g −i (t, x) ≤ g(t) for all t ∈ T and x ∈ X, and the strict one from the fact that g −i (t, x) < g(t) for all For i ∈ {1, 2}, t ∈ T , and x ∈ X, denote g i (t − , x) := lim s↑t g i (s, x).

Suppose there exists positive-measure sets A ⊂ H(t 2 ) and B ⊂ H(t
for some a < 0, ess inf A∪B f > 0, and if this is not the case, replace the sets respectively by some positive-measure subsets. Fix ϵ ∈ (0, 1). We proceed in two steps.
Step 1: A Modified Leveling Strategy.
For sufficiently small ϵ, it exists and is unique by the continuity and monotonicity ofḡ. In addition, 0 < t 1 − ϵ 1 and t 2 + ϵ 2 < 1. The left-and right-differentiability ofḡ from Lemma 1 . Consider the following modified leveling strategyρ −i , which, comparing to the leveling strategy, searches A in expense of B over ∆ 1 t, and vice versa over ∆ 2 t.
If t ∈ (t 1 , t 2 ), letρ If t ∈ ∆ 2 t, let Note that the modified strategyρ −i is a strategy for player −i, and in particular that it satisfies the capacity constraint because f . It can be verified to be t-continuous.
Step 2: Payoffs from the Modified Leveling Strategy.
Observe that the difference in strategies is d tρ−i −d tρ = − 1 f d tḡ on A and d tρ−i −d tρ = 1 f d tḡ on B over ∆ 1 t, and vice versa over ∆ 2 t. It is zero otherwise. The change in utility of the modified leveling strategy comparing to the leveling strategy, For the first term in (A.8), we perform a change of variable to get where the second equality is due to the dominated convergence theorem. The equation states that, over short time interval ∆ 1 t, both g i and ∂ − tḡ−i can be taken as constants with respect to time. The same can be applied to the other three terms.
By supposition, Therefore, there exists ϵ > 0 sufficiently small such that, against ρ i , the modified leveling strategyρ −i yields strictly higher payoff than leveling strategy, which guarantees the maxmin payoff.
The first corollary below establishes the t-continuity of the equilibrium strategy. According to Lemma 3, the posterior decreases fastest over the upper contour set H. Therefore, the posterior for those alternatives must all drop discontinuously. This violates the capacity constraint.

Corollary 2. In any Nash equilibrium, player i's strategy ρ i is t-continuous.
Proof. The statement for t = 0 follows from the monotonicity and right-continuity condition, and that for t = 1 is without loss because the set {x ∈ X : Suppose there exists positive-measure set B ⊂ X such that ρ i , or equivalently g i , is not t-continuous on (0, 1) × B. Without loss of generality, there exists b < 0 and ϵ ∈ (0, 1) such that, for all x ∈ B, there is t x ∈ (ϵ, 1) satisfying The compactness of [ϵ, 1] implies that, for any δ > 0, there exists t δ ,t δ ∈ (ϵ, 1) with t δ <t δ andt δ − t δ < δ, and positive-measure subset B δ ⊂ B such that , a positive-measure set. The capacity constraint reads which yields a contradiction as δ ↓ 0.
Corollary 3. In any Nash equilibrium, for 0 < t 1 < t 2 ≤ 1, Proof. Assume t 2 < 1. For any t ∈ (t 1 , t 2 ), H(t 1 ) ⊂ H(t). Lemma 3 thus gives the equality for x A , x B ∈ H(t 1 ) almost everywhere. The statement is obtained by taking a countable sequence t ↑ t 1 , noting that g i is t-continuous by Corollary 2.
The boundary case t 2 = 1 follows similarly by taking a countable sequence t 2 ↑ 1.
We now derive the right-derivative ∂ + tgi . For 0 < t 1 < t 2 < 1, the capacity constraint gives The inequality is due to Lemma 3. Rearranging the terms, where the third inequality is due to the definition of H. The functiong i is Lipschitz and thus absolutely continuous on (0, 1).
Take t 2 ↓ t 1 . Since H(t 2 ) ↓ H(t 1 ) in the set-inclusion sense, the dominated convergence theorem states that |H(t 2 )\H(t 1 )| ↓ 0. The second term in Equation (A.10) is dominated by The right-derivative ofg i is thus given by Sinceḡ also satisfies the first two lemmas and the two corollaries, an analogous calculation shows that Therefore,ḡ =g i + C for some constant C ∈ R. The boundary condition at t = 1 is lim t↑1gi (t) =ḡ(1) = 0 which implies C = 0.
There exists a full measure set over which the inequality holds for all t ∈ [0, 1] ∩ Q. The theorem then follows from the monotonicity and right-continuity condition.

A.3 Proof of Theorem 2
The proof of Theorem 2 follows from five lemmas. The key is Lemma 5 which establishes symmetry in any equilibrium. The idea is that, when one player imitates his opponent's strategy, the duplicated search will delay discovery and hence increases the total payoff, half of which goes to the deviating player. Lemma 6 then proves the t-continuity of the equilibrium strategy by considering a generalized one-step ahead deviation. Lemma 7 shows that the capacity constraint binds in equilibrium. Although the prize increases over time, Proof. Suppose ρ 1 ̸ = ρ 2 . We shall show that at least one player can profitably deviate by imitating his opponent's strategy so (ρ 1 , ρ 2 ) not an equilibrium. It suffices to show that the sum of changes in payoff, for player 1 to play ρ 1 and player 2 to play ρ 2 , is positive.
We may restrict attention to strategies ρ i that satisfies the terminal condition. Suppose not. There exists strategy ρ ′ i ≥ ρ i such that ρ ′ i (T, x) = 1 a.e.-it searches more to satisfy the terminal condition. If the payoff increases The additional search always duplicates the opponent's search so the payoff of the opponent also remains the same Since ρ i and ρ ′ i are payoff equivalent, we may replace ρ i by ρ ′ i when constructing a deviation.
We first consider a continuous time preference. The sum of changes in payoff is The second equality follows from a change of variable and the third from the integration by parts for Lebesgue-Stieltjes integration. The boundary term in the fourth equality vanishes because of the initial condition and terminal condition. The integral term is strictly positive since ρ 1 ̸ = ρ 2 and β is strictly increasing.
We then consider a possibly discontinuous time preference. Let {β n } n∈N be a sequence of uniformly bounded, continuous time preference that converges to β a.e.. The dominated convergence theorem implies that the sum of changes in payoff is For the rest of the proof, we denote ρ as the (symmetric) strategy in an equilibrium.

Lemma 6. ρ is t-continuous.
Proof. Suppose ρ is not t-continuous. We shall show that both players can profitably deviate to a generalized one-step ahead strategy that preempt the opponent's discontinuous search.
We then construct the generalized one-step ahead strategy. Let ϵ ∈ (0, ν(T )). Define ρ ′ : It is straightforward to verify that ρ ′ is a strategy. It expediates discontinuous search over [ν −1 (ϵ), T ] from t to t ′ (t) := ν −1 (ν(t) − ϵ) < t at the expense of delay for search over [0, ν −1 (ϵ)] in the spirit of the one-step ahead strategy. Note that t ′ (t) ↑ ν −1 (ν(t)) ≤ t as We finally show that ρ ′ is a profitable deviation for sufficiently small ϵ. Since the continuous part of ρ ′ is the same as ρ, we only need to quantify the discontinuous part. The payoff of ρ from discontinuous part is The payoff of ρ ′ from the discontinuous part satisfies The inequality follows because the payoff from simultaneous discovery is nonnegative and . The convergence follows from the dominated convergence theorem because t ′ (t) ↑ t, ν-a.e. and β( the change in payoff converges to Therefore, there exists a sufficient small ϵ such that ρ ′ is a profitable deviation.
Lemma 7. The capacity constraint binds for ρ.
We construct a profitable deviation to use the excess capacity early at the expense of search at a later time. Let ϵ ∈ (0, δ). Define ρ ′ by It is straightforward to verify that ρ ′ is a strategy. It uses ϵ fraction of the excess capacity to search B during A. The additional search on x ∈ B precludes the last bit of search over The change in payoff is In the first term, we have 1 − ρ(t, x) ≥ 1 − ρ(t, x) ≥ δ. In the second term, we have As ϵ ↓ 0, the positive, linear term dominates the negative, quadratic term so the change in payoff is positive for sufficiently small ϵ.
Inherited from ρ, it is nonincreasing and right-continuous. Let H(t) := {x ∈ X : f (x) ≥g(t)} be the upper contour set ofg.
Proof. Suppose there exists t 0 ∈ (0, 1) and positive-measure set A ⊂ H C (t 0 ) such that ρ(t 0 , x) > 0 for all x ∈ A. We shall construct a profitable deviation that searches a different set of alternatives B with posterior close tog(t 0 ) instead of A during [0, t 0 ]. This would contradict the hypothesis that (ρ, ρ) is an equilibrium.
The definition ofg implies that there exists positive-measure set B ⊂ X such that g(t 0 , B) >g(t 0 ) − δ/2 for some δ > 0. By selecting a sufficiently small δ and subsets of A and B, we also have f (A) <g(t 0 ) − δ, ρ(t 0 , B) < 1 − δ, and It is straightforward to verify that ρ ′ is a strategy. It uses fraction ϵ of capacity expended on A during [0, t 0 ] to search B, and idles its capacity on The change in payoff is In the inequality, we use g(t, B) ≥ g(t 0 , B) ≥g(t 0 ) − δ for the first term, g(t, A) < f (A) < g(t 0 ) − δ for the second term, and g(t, x) ≤ f (x) for the third term. In the equality, we The linear coefficient is positive because β is positive a.e. and ¡ A ρ(t, x)dx is a positive and absolutely continuous measure on [0, t 0 ]. Therefore, the linear term dominates the quadratic term so the difference in payoff is positive as ϵ ↓ 0.
Proof. By definition, {x ∈ X : g(t, x) >g(t)} is a null set for all t ∈ Q ∩ [0, 1]. Therefore, we may focus on {x ∈ X : g(t, x) ≤g(t) ∀t ∈ T } which is of full measure by the rightcontinuity of ρ andg. Becauseg is nondecreasing and right-continuous while g(·, x) is continuous, the differenceg(·) − g(·, x) is lower semi-continuous.
Suppose there exists t 0 ∈ (0, 1) and positive measure set A ⊂ H(t 0 ) such that ρ(t 0 , x) > x) > 0 for all x ∈ A 0 . Without loss of generality, we further haveg(t 0 ) − g(t 0 , x) > η for some η > 0. We shall construct a deviation that searches alternatives with posterior close tog instead of A 0 .
We first identify time interval [t 1 , t 2 ] over which ρ searches some set A with posterior is far from the maximum. Let t x := max{t ∈ (0, t 0 ) : By selecting a sufficiently small η and a subset of A 0 , we further have ρ(t 0 , x)−ρ(t x , x) ≥ η/2.
As we have shown the properties for all We then identify a set B posterior of which is close to the maximum. By definition ofg, there exists positive-measure set B ⊂ X such thatg(t) − g(t, B) < δ/2 for all t ∈ [t 1 , t 2 ]. By selecting a sufficiently small δ and subsets of A and B, we further have ρ(t 2 , B) < 1 − δ and We now construct the deviation strategy. Let ϵ > 0. Define ρ ′ by It is straightforward to verify that ρ ′ is a strategy. It expends fraction ϵ of the capacity on A during [t 1 , t 2 ] to search B, and idles its capacity on x ∈ B over [t x , 1] where t x := min{t ∈ T : ρ(t, x) = ϵ}.
The change in payoff is In the inequality, we use g(t, B) >g(t) − δ/2 for the first term, g(t, A) <g(t) − δ for the second term, and β(t) ≤ β(T ), g(t, x) ≤ f (x)ϵ, and ¡ [tx,T ] d t ρ(t, x) = ϵ for the third term. Since β is positive a.e. and ¡ A ρ(t, x)dx is a positive and absolutely continuous measure on [t 1 , t 2 ], the linear coefficient is positive. The linear term dominates the quadratic term so the difference in payoff is positive as ϵ ↓ 0.
Lemma 7 states that the capacity constraint binds. Therefore, Lemma 8 and Lemma 9 defines ρ as the leveling strategyρ by Lemma 1.
We shall call ρ 2 : T × X → [0, 1/α] a normalized strategy. Players' payoffs from the strategy profile (ρ 1 , ρ α 2 ) can be rewritten as payoffs from (ρ 1 , ρ 2 ) as follows: u 1 (ρ 1 , ρ α 2 ) = (1 − α)π + αu 1 (ρ 1 , ρ 2 ); (A.11) u 2 (ρ α 2 , ρ 1 ) = αu 2 (ρ 2 , ρ 1 ). (A.12) Therefore, the payoff functions under asymmetric capacity are increasing affine transformations of those with a normalized strategy of player 2. Thus, the game with asymmetric capacity is strategically equivalent to the one with a normalized strategy, and the existence and uniqueness of the Nash equilibrium in the game with asymmetric players will follow from their counterparts in the game with normalized strategies. But the latter is not quite the same as the symmetric game because of the codomain of the normalized strategy ρ 2 , i.e., it is not a priori clear that ρ 2 (1, ·) = 1 in equilibrium. This gap is closed using the following proof strategy. We decompose the maximization over normalized strategies into two components: the (normalized) probability of exploration by the end of the game ρ 2 (1, ·), and the implementation of the exploration given this probability. We shall show that, for any probability of exploration, a generalized leveling strategy is optimal for player 2, and his payoff is uniquely maximized at ρ 2 (1, ·) = 1 given the leveling strategy.
∆ρ 2 = 1. In that case, the leveling strategy coincides with the benchmark one. We denote posterior distribution as g i (t, x) := f (x) (∆ρ min (x) − ρ i (t, x)), and the upper contour set H(t) := {x ∈ X : f (x)∆ρ min (x) ≥ḡ(t)}. Note that the definitions agree with the benchmark case in which the probability of exploration is one, i.e., ∆ρ 2 = 1.
Theorem 3 is shown by two further results. The key idea is that any strategy of player 2 can be decomposed to the probability of exploration and its corresponding timing of exploration. For fixed probability of exploration, Lemma 10 characterizes the set of Nash equilibrium in the timing game as the set of leveling strategy profiles. Its proof is analogous to that of Theorem 1. Lemma 11 concludes that the symmetric leveling profile in the benchmark case is the unique Nash equilibrium in the normalized game. Over all probabilities of exploration, ∆ρ min = 1 uniquely achieves the highest minimum payoff for player 2.
Lemma 10. For any ∆ρ 2 , the profile (ρ 1 , ρ 2 ) is a Nash equilibrium in the timing game if and only if it is a leveling strategy profile.
Proof. The proof is similar to that for Theorem 1. We hereby comment on the three instances in which it requires modifications.
The payoff function can be written as The integrand in the second term motivates the more general definition of the posterior distribution. The myopic argument is applied to t ∈ [0, t * ] during which the maximum posterior isḡ(t) attained on H(t), and then to t ∈ (t * , 1] during which the maximum posterior 0 is attained on {x ∈ X : ∆ρ min = 0}. The equilibrium payoff of a leveling strategy profile (ρ i ,ρ −i ) is given by In the benchmark case, the three lemmas and the two corollaries apply to T × X; in the timing game, they carry through with restriction to [0, t * ] × X.
For the result analogous to Lemma 4, the functiong i is defined more generally as sup x∈H(t) g i (t, x) − g i (t * , x), because g 2 (t * , ·) may not zero in the timing game. As in the benchmark case, the boundary condition at t = t * implies that the constant of integration is zero C = 0. On H(t * ) = {x ∈ X : ∆ρ min (x) > 0} almost surely, the other boundary condition at t =ḡ −1 (f (x)∆ρ min (x)) < t * shows that This establishes the desired result.
Lemma 11. Letρ be the leveling strategy in the benchmark case. In the normalized game, the symmetric leveling profile (ρ,ρ) is the unique Nash equilibrium.
Proof. We first argue that the candidate is a Nash equilibrium in the normalized game.
Recall that the leveling strategy in the benchmark case is the unique leveling strategy in the timing game with ∆ρ 2 = 1. Player 1 has the same set of strategies in both the timing game with ∆ρ 2 = 1 and the normalized game. Since the profile is a Nash equilibrium in the former game, he has no profitable deviation in the latter. Player 2 has no profitable deviations by the myopic argument because of the leveling posterior g 1 .
We now show the uniqueness. Any equilibrium of the normalized game is an equilibrium of the timing game with the corresponding normalized probability of exploration, since the strategy set in the former game is a superset of that in the latter. For each ∆ρ 2 , Lemma 10 characterizes the set of Nash equilibria of the timing game as the set of leveling profiles, with equilibrium payoff given by Equation (A.13). For ∆ρ 1 = 1, the function of ∆ρ 2 (x) 1 − min{1, ∆ρ 2 (x)}∆ρ 2 (x) + 1 2 (min{1, ∆ρ 2 (x)}) is uniquely maximized at ∆ρ 2 (x) = 1. Therefore, the equilibrium payoff π/2 can only be achieved with ∆ρ 2 = 1 almost everywhere with the corresponding strategy profile (ρ,ρ).
The weak inequality is becauseρ is a best response to itself for player 1. The strict inequality is due to the fact that ∆ρ 2 (·) ̸ ≡ 1. Therefore,ρ is a profitable deviation for player 2 in the normalized game.

A.5 Proof of Theorem 4
Proof. It suffices to show that P γ (t) is increasing in γ for all t ∈ T . By differentiating Equation (3.4) with respect to time, we obtain The Lipschitz term due to the changing domain of integration vanishes because 1 −ḡ (t) f (x) = 0 on {x ∈ X : f (x) =ḡ(t)}.
As the leveling functionḡ is absolutely continuous, the probability of discovery is absolutely continuous with respect to γ with derivative almost everywhere on (0, 1). The integration by parts and then a change of variables give The set S is closed by the continuity of v 1 and v 2 , and it contains t = 1 by assumption. The inequality holds trivially at t = 0. Denote S * := S ∪ {0}.
For any t / ∈ S * , define two endpoints t := max{s ∈ S * : s < t} andt := min{s ∈ S * : s > t}. They are well-defined because S * is closed. The difference v 1 (s) − v 2 (s) has the same sign over (t,t) by continuity, so its integral ¡ t ′ 0 (v 1 (s) − v 2 (s))ds is monotonic over the same interval. As the desired inequality holds at the endpoints, it actually holds over the entire interval, and at t in particular. As the probability of simultaneous discovery is zero, the flow probabilities of discovery are 2ḡ 1 and 2ḡ 2 respectively. The desired conclusion is ¡ t 0 nḡ 1 (s)ds ≥ ¡ t 0 nḡ 2 (s)ds for all t ∈ Y. Sinceḡ 1 andḡ 2 satisfies the assumptions of Lemma 12, it suffices to show their inverses satisfy ¡ z 0 h 1 (y)dy ≤ ¡ z 0 h 2 (y)dy ∀z ∈ R + . By Equation (A.17), the Fubini theorem, and subject to the capacity constraint at t. Without the constraint ρ(t, ·) ≤ 1, the relaxed problem must is equivalent to the first-order condition (n − 1)f (x)∂ tρ (t, x)(1 − ρ(t, x)) n−2 = C for {x ∈ X : ρ(t, x) > 0} almost everywhere, together with the complementary slackness condition (n − 1)f (x)∂ tρ (t, x) ≤ C for {x ∈ X : ρ(t, x) = 0} almost everywhere, for some Lagrange multiplier C ≥ 0. It is then straightforward to show that the leveling strategy ρ(t, ·) solves the two conditions and hence the minimization problem.
From here, uniqueness can be shown along the idea of Theorem 1 by constructing a modified leveling strategy that yields strictly higher payoff, as in the case of n = 2. We provide here a shorter proof that takes advantage of the strict convexity of minimization problem (A.18) when n > 2. With duplication among other players, they can no longer achieve the minimum payoff of player i by any other strategies.
We proceed to show that the leveling strategyρ yields payoff strictly above π/n when all other players employ ρ ̸ =ρ. Since the strategy is not leveling, there exist time interval (t 0 , t 1 ) and positive-measure set A ⊂ X such that ρ(t, x) ̸ =ρ(t, x) for all t ∈ (t 0 , t 1 ) and x ∈ A. The strict convexity of minimization problem (A.18) implies that the minimizer ρ(t, ·) is unique. Therefore, the flow payoff ofρ is strictly above the minimum over (t 0 , t 1 ).
For any symmetric strategy profile (ρ, ..., ρ) for ρ ̸ =ρ, all players have a profitable deviation toρ. Therefore, the strategy profile is not an equilibrium.

A.8 Proof of Theorem 7
Proof. Suppose n ′ > n. Letḡ ′ andḡ be the leveling function associated with n ′ players and n players respectively. Parallel to the proof of the benchmark case, the probabilities of simultaneous discovery are both zero, and the flow probabilities of discovery are n ′ḡ′ and nḡ respectively. By Lemma 12, it suffices to prove the stochastic order between their inverses h ′ (·/n ′ ) and h(·/n).
By the Fubini theorem, the integral can be written as however, are weaker than their counterparts. They must hold when averaged over any measurable event in Ω that has a positive probability under P, but not necessarily at each ω ∈ Ω.
If Ω is a singleton, a mixed strategy reduces to a pure strategy as defined in Definition 1.
With realization ω ∈ Ω, an alternative x ∈ X is explored at or before time t ∈ T if and only if σ(ω, t, x) = 1, analogously to the pure strategy case. The stochastic time at which alternative x is searched is a random variable on Ω given by τ (ω, x) = min{t : σ(ω, t, x) = 1}.
Proof. The first part of Theorem 8 follows directly from the definitions of the weak measurability and the weak integral so its proof is omitted.
We show the second part by construction. By the Kolmogorov extension theorem, there exists a probability triple (Ω, F, P) in which random variables r x ∼ U (0, 1) are i.i.d. across x ∈ X. Define candidate mixed strategy σ(ω, t, x) := 1 {rx(ω)≤ρ(t,x)} (ω, t, x). By construction, it satisfies the initial condition and the monotonicity and right-continuity condition, and implements the search density.
We remark that the mixed-strategy implementation is not unique.