Web Simulator | ShareTechnote

Web Simulation

MLP(Multi Layer Perceptron) I

This tutorial visualizes a Multi-Layer Perceptron with the minimum architecture that can solve a non-linearly separable problem: 2 inputs → 2 hidden neurons → 1 output (the 2-2-1 network). It demonstrates that a hidden layer is what unlocks problems like XOR, which a single perceptron provably cannot solve.

The network connections are color-coded (green = positive weight, red = negative) with thickness proportional to magnitude. You can drive a forward pass by setting the binary inputs, or click Train to run backpropagation with momentum and watch the weights converge in real time.

NOTE: Refer to this note for the underlying theory.

Sections

Mathematical Foundation
Why XOR Needs a Hidden Layer
Training with Momentum
Simulation
Parameters
Buttons
Tips on Implementation
Limitations

Mathematical Foundation

Each neuron computes a weighted sum of its inputs and passes the result through an activation function:

z = Σ w_i · x_i + b → a = σ(z)

For the 2-2-1 network this expands to three sequential layer computations:

h₁ = σ(w₁₁x₁ + w₂₁x₂)
h₂ = σ(w₁₂x₁ + w₂₂x₂)
y = σ(w_h1h₁ + w_h2h₂)

That's 6 weights total. The network is trained by adjusting them to minimize the squared error between y and the target.

Why XOR Needs a Hidden Layer

A single-layer perceptron can only draw linear decision boundaries. XOR's truth table is:

`x₁`	`x₂`	AND	OR	XOR
0	0	0	0	0
0	1	0	1	1
1	0	0	1	1
1	1	1	1	0

For AND and OR, you can draw a single straight line separating 0s from 1s — a perceptron can solve them. For XOR you cannot: the 1s are at opposite corners and any line that separates them also splits one class. The hidden layer transforms the inputs into a new representation in which a linear separator does work, and the output layer then draws that line.

Training with Momentum

Standard gradient descent updates each weight by:

Δw = −η · ∂E/∂w

With momentum, the update also carries a fraction of the previous step's velocity:

v ← μ · v − η · ∂E/∂w → w ← w + v

The momentum coefficient μ ≈ 0.9 keeps weights moving through flat regions of the error surface where standard gradient descent would stall. This matters intensely for the 2-2-1 XOR case because that architecture sits right at the minimum capacity needed — the error surface is full of plateaus and saddle points.

Why momentum is essential here: without it, training on XOR routinely plateaus around error 0.4–0.5 and never converges. With μ = 0.9 and wider initialization ([−2.0, +2.0] instead of [−0.5, +0.5]), the network usually finds a solution within a few thousand iterations — though not always, because the 2-2-1 architecture is right at the minimum capacity needed.

Simulation

The interactive simulator is below. Pick a gate (XOR by default), click Train, and watch the iteration count and error drop. If training stalls (sometimes happens with XOR), hit Randomize to try a different starting point.

Parameters

Parameter	Range / default	Effect
Activation function	Sigmoid / Tanh / Step	Sigmoid & Tanh are trainable. Step cannot be trained (slope = 0) but is shown for comparison.
Gate preset	XOR / AND / OR / NAND / NOR	Selects the target truth table. XOR is the only one that needs the hidden layer.
Network weights `w`	6 sliders, range [−2, +2]	Manually editable for hidden layer (4) and output layer (2). Wide range helps break symmetry.
Inputs `x₁, x₂`	0 or 1 (checkboxes)	Drives a single forward pass.
Learning rate `η`	default 0.5	Step size for weight updates. Stable at 0.5 when paired with momentum.
Momentum `μ`	0–0.99, default 0.9	Velocity carry-over coefficient. Essential for 2-2-1 XOR.
Train update speed	seconds per step, default 0.1	Delay between training iterations — visualization speed only.

Buttons

Button	Effect
Reset	Stops training, randomizes weights and biases in [−2, +2], clears inputs to [0,0], wipes velocity history.
Randomize	Stops training, randomizes weights and inputs, wipes velocity. Use this when training is stuck.
Train	Runs backpropagation with momentum. Cycles through the 4 truth-table rows, updates weights after each. Stops automatically when all decisions are correct.
Test	Cycles through all 4 inputs with the current weights. Reports "Test PASS" or "Test FAIL".

Tips on Implementation

Several lessons learned from building a reliable XOR trainer at this minimal architecture:

XOR is at the architecture limit. 2-2-1 is the theoretical minimum that can solve XOR. The error surface has flat plateaus, saddle points, and local minima that vanilla gradient descent often cannot escape.
Use momentum. Without it, training plateaus at error ~0.4–0.5 within hundreds of iterations and never converges. With μ = 0.9 it usually converges within thousands.
Initialize wide. The range [−2, +2] places neurons in active regions of the sigmoid; [−0.5, +0.5] often leaves them all saturated near 0.5 from the start.
Online (per-example) updates are used here for visual feedback. Batch updates would be smoother but less educational.
Reset velocity on restart. Stale momentum from a previous training session can push the network off in a bad direction.
Convergence indicators: error drops below 0.1 and all 4 decision outputs become correct. If you see neither after a thousand iterations, hit Randomize and try again — the starting point matters.

Limitations

Fixed 2-2-1 architecture; no way to add neurons or layers in this simulator. See MLP II and MLP III for wider/deeper variants.
Truth-table problems only (4 input combinations). No continuous data, no real-world classification.
Online stochastic updates only, no mini-batch.
Pure backpropagation with momentum — no adaptive optimizers (Adam, RMSprop) or regularization.
Convergence is not guaranteed. Even with momentum + wide init, XOR at this minimum architecture occasionally fails to converge from unlucky starting weights. Use Randomize if so.