|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||
|
This tutorial extends MLP I from the minimum 2-2-1 architecture to 2-3-1 — 2 inputs, 3 hidden neurons, 1 output. The extra hidden neuron adds capacity that makes XOR training dramatically more reliable. Same forward/backward pass and training loop as MLP I, but with 9 weights instead of 6. You should observe convergence in roughly 100–500 iterations, vs the thousands (or sometimes never) that the 2-2-1 case needs. Sections Mathematical FoundationEach neuron computes the standard MLP unit: z = Σ wi · xi + b → a = σ(z)
For the 2-3-1 network the layer computations expand to:
hj = σ(w1j x1 + w2j x2) for j = 1, 2, 3
y = σ(wh1 h1 + wh2 h2 + wh3 h3) Total 9 trainable weights (6 in the hidden layer + 3 in the output layer). Training is backpropagation with momentum, identical in form to MLP I. Why the Extra Neuron HelpsXOR is not linearly separable, so the hidden layer must construct a useful intermediate representation before the output neuron can draw a decision boundary. With 2 hidden neurons there's exactly one way the network can do this; with 3, the error surface has multiple equally-good basins and many fewer flat plateaus.
Capacity vs trainability: in general, over-parameterizing slightly above the minimum needed to express the target function makes optimization far easier. Modern deep networks exploit this routinely — they're typically wildly over-parameterized relative to the function being learned.
SimulationThe interactive simulator is below. Pick a gate (XOR is the interesting one), hit Train, and watch the iteration counter. You should rarely need Randomize at this architecture. Parameters
Buttons
Tips on Implementation
Limitations
|
|||||||||||||||||||||||||||||||||||||||||||||||||||