Web Simulator | ShareTechnote

Web Simulation

MLP(Multi Layer Perceptron) II

This tutorial extends MLP I from the minimum 2-2-1 architecture to 2-3-1 — 2 inputs, 3 hidden neurons, 1 output. The extra hidden neuron adds capacity that makes XOR training dramatically more reliable.

Same forward/backward pass and training loop as MLP I, but with 9 weights instead of 6. You should observe convergence in roughly 100–500 iterations, vs the thousands (or sometimes never) that the 2-2-1 case needs.

NOTE: Refer to this note for the theoretical background.

Sections

Mathematical Foundation
Why the Extra Neuron Helps
Simulation
Parameters
Buttons
Tips on Implementation
Limitations

Mathematical Foundation

Each neuron computes the standard MLP unit:

z = Σ w_i · x_i + b → a = σ(z)

For the 2-3-1 network the layer computations expand to:

h_j = σ(w_1j x₁ + w_2j x₂) for j = 1, 2, 3
y = σ(w_h1 h₁ + w_h2 h₂ + w_h3 h₃)

Total 9 trainable weights (6 in the hidden layer + 3 in the output layer). Training is backpropagation with momentum, identical in form to MLP I.

Why the Extra Neuron Helps

XOR is not linearly separable, so the hidden layer must construct a useful intermediate representation before the output neuron can draw a decision boundary. With 2 hidden neurons there's exactly one way the network can do this; with 3, the error surface has multiple equally-good basins and many fewer flat plateaus.

Architecture	Hidden weights	Total weights	Typical XOR convergence
2-2-1 (MLP I)	4	6	1000s of iterations or stall
2-3-1 (MLP II)	6	9	100–500 iterations, almost always succeeds
2-4-1 (MLP III)	8	12	Faster still; over-parameterized for XOR

Capacity vs trainability: in general, over-parameterizing slightly above the minimum needed to express the target function makes optimization far easier. Modern deep networks exploit this routinely — they're typically wildly over-parameterized relative to the function being learned.

Simulation

The interactive simulator is below. Pick a gate (XOR is the interesting one), hit Train, and watch the iteration counter. You should rarely need Randomize at this architecture.

Parameters

Parameter	Range / default	Effect
Activation function	Sigmoid / Tanh / Step	Sigmoid and Tanh are trainable. Step has zero derivative, so cannot be trained by backprop.
Gate preset	XOR / AND / OR / NAND / NOR	Target truth table. XOR is the headline case.
Network weights	9 sliders, range [−2, +2]	6 input→hidden weights + 3 hidden→output weights, manually editable.
Inputs `x₁, x₂`	0 or 1	Drives a single forward pass.
Learning rate `η`	default 0.5	Weight-update step size. Stable at 0.5 with momentum.
Momentum `μ`	0–0.99, default 0.9	Velocity carry-over. Less critical than at 2-2-1 but still useful.
Train update speed	seconds, default 0.1	Delay between training iterations — visualization speed only.

Buttons

Button	Effect
Reset	Stops training, randomizes weights and biases, clears inputs and velocity.
Randomize	Stops training, randomizes weights and inputs. Rarely needed at 2-3-1.
Train	Runs backpropagation with momentum. Stops automatically when all decisions are correct.
Test	Cycles through all 4 input combinations with the current weights and reports PASS / FAIL.

Tips on Implementation

Capacity helps more than tricks. Going from 2 to 3 hidden neurons gives a flatter, more forgiving error surface — far more impact than tuning learning rate or activation.
Momentum still earns its keep. Even at 2-3-1, μ = 0.9 converges noticeably faster than μ = 0 by pushing through any remaining flat regions.
Wide init [−2, +2] keeps neurons away from saturated regions of the sigmoid where gradients vanish.
Online updates (one example at a time) give better visual feedback than batch updates.
Convergence indicators: error below 0.1 and all 4 decisions correct. Failure is rare at 2-3-1; if it happens, hit Randomize.

Limitations

Fixed 2-3-1 architecture; see MLP I (2-2-1) and MLP III (2-4-1) for comparisons.
Truth-table problems only (4 input combinations). No continuous data, no real-world classification.
Online stochastic updates only, no mini-batch.
Plain backpropagation with momentum — no Adam, RMSprop, or other adaptive optimizers.