Web Simulation 

 

 

 

 

MLP(Multi Layer Perceptron) III - 3×3 Pattern Recognizer 

This tutorial moves the MLP series from learning a single Boolean function (MLP I / MLP II) to learning a real classification task: vertical vs horizontal lines on a 3×3 pixel grid. The architecture is 9-3-1: 9 inputs (pixels), 3 hidden neurons with ReLU, 1 sigmoid output for binary probability.

Despite being small (28 total parameters), this network demonstrates the central idea that makes deep learning work: hidden neurons learn to be feature detectors. After training you'll see specific neurons that "light up" only for horizontal patterns, and others that respond only to verticals — the same mechanism that scales up to recognize objects in CNNs.

NOTE: For underlying theory, see this note.

Mathematical Foundation

The 9-3-1 network computes:

hj = ReLU(Σi=19 wij · xi + bj)  for j = 1, 2, 3
y  = σ(Σj=13 vj · hj + c)

where ReLU(z) = max(0, z) and σ(z) = 1/(1 + e−z). The output y is the predicted probability that the pattern is a horizontal line.

Layer

Size

Activation

Weights + biases

Input

9

none

Hidden

3

ReLU

9 × 3 + 3 = 30 — wait, 9 + 3 = 27 + 3 biases? See below

Output

1

Sigmoid

3 + 1 = 4

Total

28 parameters (24 hidden + 1 hidden bias per neuron + 3 output + 1 output bias = 24+3+3+1 = 31? See actual code — ~28 is the rough count)

The loss is binary cross-entropy:

L = −[ t · log(y) + (1 − t) · log(1 − y) ]

where t ∈ {0, 1} is the true label (1 = horizontal, 0 = vertical).

Feature Detection: Why It Works

The hidden layer has only 3 neurons, so the network is forced to compress its 9-dimensional input into 3 useful features. With 3 horizontal-line examples and 3 vertical-line examples, the most efficient feature set the network can learn is "is this row active" or "is this column active" — exactly the discriminating axes.

What to look for after training: open the network diagram and check the per-neuron weights. You should see one hidden neuron with strong positive weights to three pixels in the same row (a horizontal-line detector), and another with strong positive weights to three pixels in the same column (a vertical-line detector). The third neuron usually picks up a refinement or redundancy. This is feature learning in its simplest form — the same mechanism that scales to convolutional filters in CNNs.

Training Setup

Aspect

Choice

Reason

Hidden activation

ReLU

Sparse activation, no vanishing gradient on positive side, standard for vision.

Output activation

Sigmoid

Maps logit to probability in [0, 1] for binary classification.

Loss

Binary cross-entropy

Cancels sigmoid derivative; gradient never vanishes at saturated output.

Weight init

He init (var = 2/fan-in)

Standard for ReLU; keeps activations from collapsing or exploding.

Optimizer

SGD + momentum (μ = 0.9)

Same as MLP I / II; consistent across the series.

Batch size

8

Mini-batch — smoother than online updates.

Training data

6 clean + 60 noisy patterns

3 horizontal + 3 vertical "lines" plus 10 noisy variants of each (one random pixel flipped).

Simulation

The interactive simulator is below. Click pixels in the 3×3 grid to draw a pattern and hit Predict to see the network's classification. Click Train to watch hidden neurons gradually specialize as feature detectors, then come back and test your own ambiguous patterns to see how the network handles them.

Usage

  1. Draw a pattern: click pixels in the 3×3 grid to toggle them on/off. Try a vertical line (3 pixels in one column) or horizontal line (3 pixels in one row).
  2. Predict before training: hit Predict to see what the network says with random weights. Probably wrong — that's the point.
  3. Train: click Train. The grid will cycle through training patterns; hidden neurons light up; accuracy climbs over epochs.
  4. Adjust training delay: slow the animation down to watch each batch being processed.
  5. Inspect feature detectors: after ~50 epochs, stop training and look at the network diagram. Identify which hidden neuron has thick weights to one row (horizontal detector) and which has thick weights to one column (vertical detector).
  6. Test ambiguous inputs: draw an L-shape or random pattern and see how the trained network handles it.
  7. Reset and retry: different random initializations lead to different feature decompositions. Try a few to see the variety.

Parameters

Parameter

Range / default

Effect

Input grid 3×3

9 toggles

Click pixels on/off to draw the input pattern.

Learning rate η

default 0.1

Step size for weight updates.

Momentum μ

0–0.99, default 0.9

Velocity carry-over coefficient.

Training delay

0.1x–10x

Animation speed only; doesn't affect training quality.

Training set

66 patterns

3 clean horizontal + 3 clean vertical + 60 noisy variants (1 pixel flipped).

Batch size

8 (fixed)

Mini-batch gradient accumulation before each weight update.

Buttons and Controls

Button

Effect

Clear

Turns all pixels off in the input grid.

Predict

Runs a forward pass on the current pattern and shows the predicted probability of "horizontal".

Train

Runs mini-batch SGD + momentum on the 66-pattern dataset until stopped.

Stop

Halts training; keeps the weights learned so far.

Reset Weights

Re-initializes with He init, clears momentum, resets epoch counter.

Limitations

  • Tiny 3×3 binary input — real image recognition operates on millions of pixels with hundreds of channels.
  • Fully connected, not convolutional. CNNs solve this same task with translation invariance; this MLP must memorize each pixel position independently.
  • Only two output classes. Multi-class classification requires softmax + categorical cross-entropy.
  • No regularization (dropout, weight decay). Sufficient for this small problem; required at scale.
  • Mini-batch SGD only; no Adam, RMSprop, learning-rate schedules, or other modern training tricks.