Web Simulator | ShareTechnote

Nash Equilibrium - Spatial Prisoner's Dilemma

Nash Equilibrium in Spatial Games

This simulation visualizes Nash Equilibrium through the Spatial Prisoner's Dilemma—a cellular automaton where each cell represents an agent playing the Prisoner's Dilemma against its neighbors. This demonstrates how local interactions can lead to emergent global patterns and stable equilibria in game theory.

The simulation uses a 100×50 grid where each agent plays against its 8 neighbors (Moore Neighborhood). Each round, agents calculate their total score based on interactions with all neighbors using a payoff matrix. After scoring, each agent adopts the strategy of its highest-performing neighbor, creating an evolutionary dynamic that leads to Nash Equilibria.

You can explore how different payoff matrices affect the emergence of cooperation or defection. Adjust the payoff values (Temptation, Reward, Punishment, Sucker) in real-time to see how the game dynamics change. Try different initial conditions: random distributions, all cooperators, all defectors, or a single defector in a sea of cooperators.

Setup: the simulation uses a toroidal grid (wrapping edges) to eliminate boundary effects, ensuring every agent has 8 neighbors. The evolution rule is deterministic: each agent adopts the strategy of its best-performing neighbor. This produces patterns where cooperation thrives in clusters while defection spreads along boundaries.

Sections

Math behind the Simulation
Why "Cheaters Always Win" is Not Always True
Simulation
Usage Example
Parameters
Buttons and Controls
Interaction and Visualization
Limitations

Math behind the Simulation

The Prisoner's Dilemma is defined by a payoff matrix where two players each choose to Cooperate (C) or Defect (D):

	Opponent: C	Opponent: D
You: C	R (Reward)	S (Sucker)
You: D	T (Temptation)	P (Punishment)

The classic Prisoner's Dilemma requires T > R > P > S and 2R > T + S. In a one-shot game, defection is the dominant strategy — but in repeated or spatial games, cooperation can emerge as an equilibrium.

In the Spatial Prisoner's Dilemma, each agent i at position (x, y) plays against its 8 neighbors and sums its score; it then copies the strategy of the best-scoring agent in its neighborhood:

S_i = Σ_j∈N(i) Payoff(strategy_i, strategy_j)
strategy_i^t+1 = strategy_{argmax(S_j)}^t

where N(i) is the set of neighbors (8 in the Moore Neighborhood, including the agent itself in the argmax).

where j ranges over all neighbors including the agent itself. This update rule creates an evolutionary dynamic where successful strategies spread spatially, leading to stable Nash Equilibria characterized by clusters of cooperators or defectors.

The simulation uses double buffering to ensure simultaneous updates: all agents calculate their next strategy based on the current generation, then swap to the new generation atomically. This prevents artifacts that would occur with sequential updates.

Why "Cheaters Always Win" is Not Always True

If the logic always led to "Cheaters always win," then complex life (cells, societies, ecosystems) would never have evolved. In Game Theory, the conclusion depends entirely on two factors that you can test in this simulation.

If you observe "Cheaters always win," you are seeing only the Nash Equilibrium of a Single-Shot Game. The magic of the Spatial Iterated version (this simulation) is that it proves Cooperators can win, but they need specific conditions.

1. The Lone Wolf vs. The Phalanx (Geometry Matters)

In a random "soup" (the Random preset), Defectors usually win. This is because a Cooperator stands alone, gets exploited by a neighbor, receives low points, and immediately mimics the successful Defector next round.

However, Cooperators win when they cluster.

The Defector's Weakness: Defectors punish each other. If a Defector is surrounded by other Defectors, they all get the low Punishment score (P).
The Cooperator's Strength: If Cooperators form a solid block (like the "Fortress" preset), the ones in the middle interact only with other Cooperators. They get the Reward score (R) from all 8 neighbors.
The Math: A Cooperator inside a cluster (8×R points) often scores higher than a Defector on the edge of that cluster who exploits 3 neighbors but gets punished by 5 others.

Conclusion: Cheating is better for individuals in a disorganized crowd. Cooperation is better for groups in a structured society.

2. The Payoff Ratio (The "b" variable)

The most important lesson from this simulation is that morality is mathematical. Whether cheating is a "good" strategy depends entirely on the Temptation (T) value relative to the Reward (R).

You can prove this in the simulation by changing the slider for T (Temptation):

If T >> R: The incentive to cheat is so high that Defectors will erode even solid clusters of Cooperators. (The "Wild West" scenario).
If T ≈ R: The benefit of cheating is marginal. Defectors might win at the edges, but they cannot penetrate a solid block of Cooperators. The cluster holds.

3. Siege Warfare vs. Cancer/Treason: External vs. Internal Threats

This is one of the most profound insights from the Spatial Prisoner's Dilemma. You have essentially discovered the difference between Siege Warfare (external threats) and Cancer/Treason (internal threats).

In Game Theory terms, this demonstrates that naive cooperation is geometrically robust but topologically fragile.

The Math: Why the "Inside Job" is Deadly

Let's look at the scores using your current "Chaos" settings (T=1.9, R=1.0):

Scenario A: The External Attacker (Siege)
Imagine a solid wall of Cooperators. A Defector comes up against the flat edge of this wall.

The Defector: Touches 3 Cooperators (and 5 empty/defecting cells).
Score: 3 × 1.9 = 5.7
The Defending Cooperator (Inside the wall): Touches 8 other Cooperators.
Score: 8 × 1.0 = 8.0

Result: 8.0 > 5.7. The Cooperator earns more points. The wall holds. The Defector cannot break in.

Scenario B: The Internal Traitor (Infection)
Now, imagine a single Defector surrounded by 8 Cooperators (the "Trojan Horse").

The Traitor (Center): Touches 8 Cooperators. It exploits everyone.
Score: 8 × 1.9 = 15.2
The Victim (Neighbor): Touches the Traitor + 7 other Cooperators.
Score: (7 × 1.0) + (1 × 0) = 7.0

Result: 15.2 ≫ 7.0. The Traitor's score is more than double the score of the loyal Cooperators next to it.

Consequence: In the next generation, all 8 neighbors look at the center, see that massive score of 15.2, and say, "Wow, that strategy works great!" They all switch to Defect. The "infection" explodes outward.

The Biological Analogy: Cancer vs. Skin

Your simulation is modeling why complex life forms need immune systems:

External Robustness: This is like your skin. It protects you from bacteria on the outside because the "cooperating" cells are tightly knit and support each other. External threats face a united front.
Internal Fragility: This is cancer. If a single cell inside your body decides to "defect" (grow uncontrollably without respecting neighbors), it has access to all the resources (blood/sugar) of the neighbors without the competition found outside. It grows rapidly and kills the host.

Try This Experiment

Select the "Fortress" preset (a solid block of Cooperators).
Click "Start" and watch how the Defectors outside cannot penetrate the fortress walls.
Click "Stop", then manually click on a cell inside the fortress to flip it to Defect.
Click "Start" again and watch how the single internal defector explodes and destroys the entire cluster from within.

This demonstrates why societies, organizations, and biological systems need immune systems, quality control, and internal monitoring—not just external defenses.

4. The "Cancer" Problem: Global Consequences

There is a final, darker conclusion to Game Theory. If you run the simulation with High Temptation (T >> R), the Defectors will sweep across the board, turning every cell Red.

But look at the score. Once the board is 100% Red (Defectors), every cell is now scoring P point per interaction (Punishment). When the board was 100% Blue (Cooperate), every cell was scoring R points.

By "winning," the Defectors destroyed the total wealth of the system. This effectively models cancer or resource depletion:

The Defector (Cancer) out-competes the Cooperator (Healthy Cell) locally.
But once the Defector takes over the whole body, the host dies, and the game is over.

Try This Experiment

To see the "Cheater" strategy fail, you simply need to change the environment, not the code:

Set Temptation (T) to 3.5 (just slightly higher than Reward).
Set Reward (R) to 3.
Use the "Fortress" preset (a block of Cooperators).

You will see the Defectors attack the walls of the fortress, but they will fail to break in. The Cooperators will survive forever because their collective support outweighs the individual gain of cheating.

Simulation

The interactive simulator is below. Use the controls to explore the concepts described above.

Payoff Preset:

Temptation (T): 1.0

Reward (R): 0.0

Punishment (P): -5.0

Sucker (S): -1.0

Speed (fps): 10

Preset:

Population Over Time

Generation: 0

Cooperators: 0 (0.00%) | Defectors: 0 (0.00%)

Custom Initialization Logic:

Enter a JavaScript expression that returns 'C' (Cooperate) or 'D' (Defect) based on position (x, y, width, height). Example: (x % 2 === 0) ? 'C' : 'D'

Usage Example

Follow these steps to explore Nash Equilibrium in spatial games:

Initial Setup: When you first load the simulation, the grid starts with all agents as defectors (red). Click one of the preset buttons to set an initial configuration:
- Random: Creates a random distribution of cooperators (green) and defectors (red)
- All Cooperate: Sets all agents to cooperate (green grid)
- All Defect: Sets all agents to defect (red grid)
- Single Defector: Places one defector in the center of a field of cooperators
- Checkerboard: Creates an alternating pattern of cooperation and defection
Adjust Payoff Matrix: Use the sliders to adjust the payoff values (T, R, P, S). The classic Prisoner's Dilemma requires T > R > P > S. Experiment with different values to see how they affect the emergence of cooperation. Higher T (Temptation) makes defection more attractive, while higher R (Reward) encourages cooperation.
Start the Simulation: Click "Start" to begin the evolution. Watch how strategies spread across the grid. Notice how:
- Clusters of cooperators can survive if they're large enough
- Defectors spread along boundaries between cooperator clusters
- Small cooperator clusters get invaded by defectors
- Large cooperator clusters can expand and stabilize
Observe Nash Equilibrium: The simulation converges to a Nash Equilibrium where no agent can improve by changing strategy. You'll see:
- Stable clusters of cooperators surrounded by defectors
- Complex boundaries where strategies meet
- Oscillating patterns in some configurations
- Complete dominance by one strategy in others
Experiment with Speed: Adjust the "Speed (fps)" slider to control the simulation speed. Higher values show evolution faster, while lower values let you observe each generation in detail. Use slower speeds to study how strategies spread and how boundaries form.
Try Custom Logic: Use the "Custom Initialization Logic" textarea to define complex initial patterns. Enter a JavaScript expression that returns 'C' or 'D' based on position. Examples:
- Radial Pattern: Math.sqrt((x-w/2)**2 + (y-h/2)**2) < 15 ? 'C' : 'D'
- Stripes: x % 10 < 5 ? 'C' : 'D'
- Spiral: (x+y) % 20 < 10 ? 'C' : 'D'
Study Pattern Formation: Observe how spatial structure affects the outcome:
- Start with "Single Defector" and watch how defection spreads
- Try "All Cooperate" and introduce a few random defectors
- Use "Checkerboard" to see how unstable patterns evolve
- Create custom patterns to test specific hypotheses
Compare Different Payoffs: Reset and try different payoff matrices:
- Classic PD: T=5, R=3, P=1, S=0 (cooperation can emerge in clusters)
- High Temptation: T=10, R=3, P=1, S=0 (defection dominates)
- High Reward: T=5, R=8, P=1, S=0 (cooperation spreads easily)
- Equal Payoffs: T=R=P=S (neutral evolution, random patterns)
Monitor Statistics: Watch the statistics panel to track:
- Generation count (how many evolution steps)
- Number and percentage of cooperators vs defectors
- Whether the system has reached equilibrium (stable counts)

Key insight: spatial structure lets cooperation thrive even when defection would dominate in a well-mixed population. Large clusters of cooperators provide mutual benefits, making them resistant to invasion by defectors — a demonstration of how local interactions can produce global cooperation, a fundamental principle in evolutionary game theory.

Parameters

Followings are short descriptions on each parameters

Temptation (T): The payoff when you defect while your neighbor cooperates. This is typically the highest payoff in the Prisoner's Dilemma. Higher values make defection more attractive. Default is 5.
Reward (R): The payoff when both you and your neighbor cooperate. This is the second-highest payoff. Higher values encourage cooperation. Default is 3.
Punishment (P): The payoff when both you and your neighbor defect. This is typically the second-lowest payoff. Default is 1.
Sucker (S): The payoff when you cooperate while your neighbor defects. This is typically the lowest payoff. Default is 0.
Speed (fps): Controls the frames per second of the simulation. Higher values (up to 60) make evolution faster, while lower values (1-5) show detailed progression. Default is 10 fps.
Grid Size: The simulation uses a 100×50 grid (5000 agents). Each agent interacts with its 8 neighbors (Moore Neighborhood). The grid wraps around (toroidal topology) to eliminate boundary effects.
Evolution Rule: Each generation, every agent calculates its total score from interactions with all 8 neighbors. Then, each agent adopts the strategy of its highest-scoring neighbor (including itself). This creates simultaneous updates using double buffering.

Buttons and Controls

Followings are short descriptions on each control

Start/Stop: Toggles the simulation. When running, the button shows "Stop" and turns red. Click to pause and observe the current state. The simulation evolves generation by generation according to the update rule.
Reset: Stops the simulation and resets the grid to all defectors. Clears generation count and statistics. Use this to start fresh experiments.
Preset Buttons: Quickly initialize the grid with common patterns:
- Random: Fills grid with random cooperators and defectors
- All Cooperate: Sets all agents to cooperate (green)
- All Defect: Sets all agents to defect (red)
- Single Defector: One defector at center, rest cooperate
- Checkerboard: Alternating pattern of cooperation and defection
Apply Custom Logic: Executes the JavaScript code in the textarea to initialize the grid. The code should return 'C' (Cooperate) or 'D' (Defect) for each position (x, y) given grid dimensions (width, height).

Interaction and Visualization

Grid Visualization: The grid displays cooperators as green cells and defectors as red cells. Each cell represents one agent. The grid wraps around (toroidal) so agents on edges interact with agents on opposite edges.
Real-time Updates: During simulation, the grid updates generation by generation. Each generation:
- All agents calculate scores from neighbor interactions
- All agents simultaneously adopt best neighbor's strategy
- Grid is rendered with new strategy distribution
- Statistics are updated
Pattern Formation: Watch how spatial patterns emerge:
- Cluster Formation: Cooperators form clusters for mutual benefit
- Boundary Dynamics: Defectors spread along cluster boundaries
- Invasion: Small clusters get invaded, large clusters expand
- Equilibrium: System converges to stable Nash Equilibrium
Statistics Panel: Real-time display of:
- Generation count (evolution steps)
- Number of cooperators and defectors
- Percentage of each strategy
Custom Logic Injection: Write JavaScript expressions to create complex initial patterns. The function signature is implicit: you write an expression that uses variables x, y, w (width), h (height). Examples:
- Simple: Math.random() > 0.5 ? 'C' : 'D'
- Conditional: (x + y) % 2 === 0 ? 'C' : 'D'
- Distance-based: Math.sqrt((x-25)**2 + (y-25)**2) < 10 ? 'C' : 'D'

Limitations

One specific update rule. Agents deterministically copy their single best-scoring neighbor ("imitate-the-best"). Other plausible dynamics — replicator, Fermi/probabilistic imitation, best-response, or noise/mutation — give qualitatively different outcomes not shown here.
2×2, two strategies only. Pure Cooperate/Defect with a 4-value payoff matrix. Mixed strategies, multi-action games, and continuous cooperation levels are out of scope, and Nash equilibria here are emergent spatial patterns, not solved analytically.
Memoryless, one-shot per round. Agents have no memory of past play, so reciprocity strategies (Tit-for-Tat, Pavlov) that drive cooperation in the iterated dilemma cannot arise.
Fixed lattice topology. A regular toroidal grid with Moore neighborhoods; real interaction networks are irregular, weighted, and dynamic, which strongly affects whether cooperation survives.
Deterministic, synchronous updates. Double-buffered simultaneous updates can create artifacts (e.g. blinkers) absent under asynchronous updating; results are sensitive to this choice.
Illustrative, not predictive. Payoff values are sliders for exploration; the demo shows mechanisms (clustering, invasion) rather than calibrated predictions about any real social or biological system.