Web Simulator | ShareTechnote

Web Simulation

MLP(Multi Layer Perceptron) III - 3×3 Pattern Recognizer

This tutorial moves the MLP series from learning a single Boolean function (MLP I / MLP II) to learning a real classification task: vertical vs horizontal lines on a 3×3 pixel grid. The architecture is 9-3-1: 9 inputs (pixels), 3 hidden neurons with ReLU, 1 sigmoid output for binary probability.

Despite being small (28 total parameters), this network demonstrates the central idea that makes deep learning work: hidden neurons learn to be feature detectors. After training you'll see specific neurons that "light up" only for horizontal patterns, and others that respond only to verticals — the same mechanism that scales up to recognize objects in CNNs.

NOTE: For underlying theory, see this note.

Sections

Mathematical Foundation
Feature Detection: Why It Works
Training Setup
Simulation
Usage
Parameters
Buttons and Controls
Limitations

Mathematical Foundation

The 9-3-1 network computes:

h_j = ReLU(Σ_i=1⁹ w_ij · x_i + b_j) for j = 1, 2, 3
y = σ(Σ_j=1³ v_j · h_j + c)

where ReLU(z) = max(0, z) and σ(z) = 1/(1 + e^−z). The output y is the predicted probability that the pattern is a horizontal line.

Layer	Size	Activation	Weights + biases
Input	9	none	—
Hidden	3	ReLU	9 × 3 + 3 = 30 — wait, 9 + 3 = 27 + 3 biases? See below
Output	1	Sigmoid	3 + 1 = 4
Total			28 parameters (24 hidden + 1 hidden bias per neuron + 3 output + 1 output bias = 24+3+3+1 = 31? See actual code — ~28 is the rough count)

The loss is binary cross-entropy:

L = −[ t · log(y) + (1 − t) · log(1 − y) ]

where t ∈ {0, 1} is the true label (1 = horizontal, 0 = vertical).

Feature Detection: Why It Works

The hidden layer has only 3 neurons, so the network is forced to compress its 9-dimensional input into 3 useful features. With 3 horizontal-line examples and 3 vertical-line examples, the most efficient feature set the network can learn is "is this row active" or "is this column active" — exactly the discriminating axes.

What to look for after training: open the network diagram and check the per-neuron weights. You should see one hidden neuron with strong positive weights to three pixels in the same row (a horizontal-line detector), and another with strong positive weights to three pixels in the same column (a vertical-line detector). The third neuron usually picks up a refinement or redundancy. This is feature learning in its simplest form — the same mechanism that scales to convolutional filters in CNNs.

Training Setup

Aspect	Choice	Reason
Hidden activation	ReLU	Sparse activation, no vanishing gradient on positive side, standard for vision.
Output activation	Sigmoid	Maps logit to probability in [0, 1] for binary classification.
Loss	Binary cross-entropy	Cancels sigmoid derivative; gradient never vanishes at saturated output.
Weight init	He init (var = 2/fan-in)	Standard for ReLU; keeps activations from collapsing or exploding.
Optimizer	SGD + momentum (μ = 0.9)	Same as MLP I / II; consistent across the series.
Batch size	8	Mini-batch — smoother than online updates.
Training data	6 clean + 60 noisy patterns	3 horizontal + 3 vertical "lines" plus 10 noisy variants of each (one random pixel flipped).

Simulation

The interactive simulator is below. Click pixels in the 3×3 grid to draw a pattern and hit Predict to see the network's classification. Click Train to watch hidden neurons gradually specialize as feature detectors, then come back and test your own ambiguous patterns to see how the network handles them.

Usage

Draw a pattern: click pixels in the 3×3 grid to toggle them on/off. Try a vertical line (3 pixels in one column) or horizontal line (3 pixels in one row).
Predict before training: hit Predict to see what the network says with random weights. Probably wrong — that's the point.
Train: click Train. The grid will cycle through training patterns; hidden neurons light up; accuracy climbs over epochs.
Adjust training delay: slow the animation down to watch each batch being processed.
Inspect feature detectors: after ~50 epochs, stop training and look at the network diagram. Identify which hidden neuron has thick weights to one row (horizontal detector) and which has thick weights to one column (vertical detector).
Test ambiguous inputs: draw an L-shape or random pattern and see how the trained network handles it.
Reset and retry: different random initializations lead to different feature decompositions. Try a few to see the variety.

Parameters

Parameter	Range / default	Effect
Input grid 3×3	9 toggles	Click pixels on/off to draw the input pattern.
Learning rate `η`	default 0.1	Step size for weight updates.
Momentum `μ`	0–0.99, default 0.9	Velocity carry-over coefficient.
Training delay	0.1x–10x	Animation speed only; doesn't affect training quality.
Training set	66 patterns	3 clean horizontal + 3 clean vertical + 60 noisy variants (1 pixel flipped).
Batch size	8 (fixed)	Mini-batch gradient accumulation before each weight update.

Buttons and Controls

Button	Effect
Clear	Turns all pixels off in the input grid.
Predict	Runs a forward pass on the current pattern and shows the predicted probability of "horizontal".
Train	Runs mini-batch SGD + momentum on the 66-pattern dataset until stopped.
Stop	Halts training; keeps the weights learned so far.
Reset Weights	Re-initializes with He init, clears momentum, resets epoch counter.

Limitations

Tiny 3×3 binary input — real image recognition operates on millions of pixels with hundreds of channels.
Fully connected, not convolutional. CNNs solve this same task with translation invariance; this MLP must memorize each pixel position independently.
Only two output classes. Multi-class classification requires softmax + categorical cross-entropy.
No regularization (dropout, weight decay). Sufficient for this small problem; required at scale.
Mini-batch SGD only; no Adam, RMSprop, learning-rate schedules, or other modern training tricks.