Web Simulator | ShareTechnote

Web Simulation

Deep AutoEncoder Tutorial - Multi-Layer Compression & Reconstruction

This interactive tutorial provides a visual simulation of a Deep AutoEncoder neural network with intermediate hidden layers. An AutoEncoder is a type of neural network that learns to compress (encode) input data into a compact representation (bottleneck/latent code) and then reconstruct (decode) the original input from that compressed representation. This demonstrates the fundamental concept of dimensionality reduction and feature learning.

The architecture processes 3×3 pixel patterns (9 pixels) through a multi-layer encoder that gradually compresses them: 9 → 6 → 3 (the bottleneck), then through a multi-layer decoder that gradually expands them: 3 → 6 → 9. This creates an hourglass shape that clearly visualizes the "narrowing down" and "expanding up" process. The network is trained to minimize the reconstruction error - the difference between the input and the reconstructed output. This tutorial uses a 9-6-3-6-9 architecture: 9 input neurons → 6 encoder hidden neurons → 3 bottleneck neurons → 6 decoder hidden neurons → 9 output neurons.

The training uses backpropagation with momentum on a dataset of simple 3×3 patterns (horizontal lines, vertical lines, diagonals, corners, cross). As the network trains, you can observe how it learns to compress different patterns through the narrowing layers into the 3-number bottleneck code, then expand them back through the widening layers. The network diagram shows all five layers and their connections, making the compression/expansion process visually clear.

Glass-box architecture: the 9-6-3-6-9 network has only ~162 parameters — small enough to visualize every connection, yet complex enough to learn meaningful compression. The intermediate 6-neuron layers make the gradual compression/expansion easier to follow than a direct 9→3→9 squeeze, and the 3-neuron bottleneck forces the network to find efficient representations. Refer to this note for theoretical details.

Sections

Mathematical Foundation
Simulation
Usage Example
Parameters
Buttons and Controls
Key Concepts and Implementation
Limitations

Mathematical Foundation

An autoencoder is two networks trained together: an encoder f that maps the input x to a low-dimensional code z, and a decoder g that reconstructs x̂ from z:

z = f(x) = σ(W_ex + b_e) x̂ = g(z) = σ(W_dz + b_d)

Training minimizes the reconstruction error — the mean squared difference between input and output — by backpropagation:

L = (1/N) Σ_i (x_i − x̂_i)²

Because z has far fewer dimensions than x, the network cannot simply copy the input through — it must discover structure shared across patterns. The layer sizes form an hourglass:

Layer	Neurons	Role
Input	9	3×3 pixel pattern
Encoder hidden	6	First compression stage
Bottleneck (z)	3	Latent code — the learned representation
Decoder hidden	6	First expansion stage
Output (x̂)	9	Reconstructed pattern

The bottleneck is the whole point. Forcing 9 pixels through just 3 numbers means the autoencoder must trade reconstruction fidelity for compression. A wider bottleneck reconstructs better but compresses less; a narrower one compresses harder but loses detail.

Simulation

The interactive simulator is below. Use the controls to explore the concepts described above.

Usage Example

Follow these steps to explore how the AutoEncoder learns to compress and reconstruct patterns:

Initial State: When you first load the simulation, the network has random weights. The input grid is empty (all pixels off). Click on pixels in the 3×3 input grid to create a pattern. The reconstruction output will initially be poor (random gray values) because the network hasn't learned yet.
Test Before Training: Click pixels to create a pattern and observe:
- The input grid shows your pattern (black = on, white = off)
- The bottleneck bars (positioned above their numerical values) show the compressed representation (3 values between 0 and 1), growing from bottom to top
- The reconstruction grid shows the network's attempt to recreate your pattern (grayscale values)
- The reconstruction error shows how different the output is from the input
- The network diagram shows all connections in a vertical hourglass layout: Input (9) → Encoder Hidden (6) → Bottleneck (3) → Decoder Hidden (6) → Output (9)
- The Learning Rate and Momentum sliders allow you to adjust training parameters
With random weights, the reconstruction will be poor and the error will be high.
Start Training: Adjust the Learning Rate and Momentum sliders if desired (defaults: 0.15 and 0.8), then click the "Train" button. Watch as the network learns:
- The loss value decreasing over epochs (shown in the control panel)
- The network diagram updating to show weight changes in the vertical hourglass layout
- After training, test your input pattern again - the reconstruction should be much better
- The bottleneck values will encode meaningful information about the pattern (each value represents a different learned feature)
The network trains on a dataset of simple patterns (lines, diagonals, corners, cross). You can adjust Learning Rate and Momentum during training to experiment with different learning dynamics.
Observe Compression: After training, try different patterns and observe:
- How the bottleneck (3 numbers) encodes different patterns
- Similar patterns produce similar bottleneck codes
- The reconstruction quality improves significantly after training
- The reconstruction error decreases as the network learns
Test Reconstruction: Try drawing different patterns:
- Horizontal lines (all pixels in one row)
- Vertical lines (all pixels in one column)
- Diagonal patterns
- Corner patterns
- Cross pattern (center row and column)
- Dot (center pixel only)
- Custom patterns you create
Observe how well the network reconstructs patterns it was trained on vs. new patterns.
Observe the Hourglass Shape: The network diagram shows the hourglass architecture arranged vertically (top to bottom):
- Input layer (9 neurons) - widest at the top
- Encoder hidden layer (6 neurons) - narrowing down
- Bottleneck layer (3 neurons) - narrowest point in the middle (the compressed code)
- Decoder hidden layer (6 neurons) - expanding up
- Output layer (9 neurons) - widest at the bottom
Layer widths are proportional to neuron count, creating a clear visual hourglass that makes the compression and expansion process intuitive. The vertical layout shows data flowing from top (input) through compression (narrowing) to the bottleneck, then through expansion (widening) to the output at the bottom.
Understand the Bottleneck: The bottleneck is the key concept:
- It forces the network to compress 9 pixels into just 3 numbers
- The intermediate layers (6 neurons) create a gradual compression/expansion
- This compression must preserve enough information to reconstruct the pattern
- The bottleneck values (0-1) represent the compressed code
- Different patterns produce different bottleneck codes
- Each of the 3 values has practical meaning: after training, each neuron learns to detect different features (horizontal, vertical, diagonal/corner patterns)
- The visualization shows bars above their numerical values, growing from bottom to top, making it easy to see activation levels
Experiment with Parameters: Try adjusting the Learning Rate and Momentum sliders to see how they affect training:
- Higher Learning Rate (0.3-0.5): Faster learning but may overshoot optimal weights
- Lower Learning Rate (0.01-0.05): Slower but more stable convergence
- Higher Momentum (0.9-0.99): Faster convergence, helps escape local minima
- Lower Momentum (0.3-0.5): More conservative updates
Reset and Retry: Click "Reset" to reinitialize the network with random weights. Try training again and observe how different initializations can lead to different learned compression schemes, though the final reconstruction quality should be similar.

Tip: The key insight is that the bottleneck forces the network to learn a compact representation. After training, notice how the network can reconstruct training patterns well, but may struggle with completely new patterns. This demonstrates the fundamental trade-off in AutoEncoders: compression vs. reconstruction quality. The bottleneck visualization shows how the network encodes patterns into just 3 numbers.

Parameters

Followings are short descriptions on each parameters

Input Grid (3×3): A clickable 3×3 grid where you can toggle pixels on/off by clicking. Black pixels are "on" (value 1), white pixels are "off" (value 0). Click any pixel to toggle it. The grid automatically updates the reconstruction as you draw. This is the pattern you want the network to compress and reconstruct.
Deep AutoEncoder Architecture: The network has five layers creating an hourglass shape:
- Input Layer: 9 neurons (3×3 grid)
- Encoder Hidden Layer: 6 neurons (first compression step - narrowing down)
- Bottleneck Layer: 3 neurons (most compressed - the latent code)
- Decoder Hidden Layer: 6 neurons (first expansion step - expanding up)
- Output Layer: 9 neurons (3×3 reconstruction)
Total parameters: (9×6 + 6) + (6×3 + 3) + (3×6 + 6) + (6×9 + 9) = 54 + 21 + 24 + 63 = 162 weights and biases. This is small enough to visualize every connection clearly, and the intermediate layers make the compression/expansion process more intuitive.
Bottleneck (Compressed Code): The middle layer with 3 neurons. This is the compressed representation - the network must encode all information about the 9-pixel pattern into just 3 numbers (each between 0 and 1). The visualization shows three bars, one for each bottleneck neuron, positioned above their numerical values. Bars grow from bottom to top (like traditional bar charts), with height representing activation value. Green bars indicate high activation (>0.5), orange bars indicate low activation. Each bottleneck value has practical meaning: after training, each neuron learns to detect different features (e.g., horizontal patterns, vertical patterns, diagonal/corner patterns). The three values together form a unique code that identifies the input pattern.
Reconstruction Grid (3×3): The output of the decoder, attempting to recreate the input pattern. Cells are displayed in grayscale: darker = higher activation (closer to 1), lighter = lower activation (closer to 0). The border is highlighted (black) when activation > 0.5. This is read-only - you cannot edit it directly.
Reconstruction Error: Shows the Mean Squared Error (MSE) between the input and output. Lower values mean better reconstruction. Error = sum of (input[i] - output[i])² for all 9 pixels. Perfect reconstruction would have error = 0.

Network Visualization

Learning Rate: Adjustable slider (range: 0.01 to 0.5, default: 0.15) that controls the step size for weight updates during backpropagation. Higher values learn faster but may overshoot optimal weights. Lower values are more stable but slower. You can adjust this parameter in real-time using the slider in the control panel. Changes take effect immediately during training.
Momentum: Adjustable slider (range: 0 to 0.99, default: 0.8) that controls the momentum factor used in momentum-based gradient descent. Higher values maintain more velocity from previous updates, helping the network escape local minima and converge faster. Lower values (closer to 0) reduce momentum effects. You can adjust this parameter in real-time using the slider in the control panel. Changes take effect immediately during training.
Training Data: The network is trained on a dataset of simple 3×3 patterns: 3 horizontal lines, 3 vertical lines, 2 diagonal patterns, 2 corner patterns, 1 cross pattern, and 1 dot pattern. Total: 12 training samples. These patterns are shuffled each epoch.

Buttons and Controls

Followings are short descriptions on each control

Clear: Clears the 3×3 input grid, turning all pixels off. The reconstruction grid and bottleneck bars update immediately to show the network's response to an empty pattern.
Reconstruction: (Auto-updates) The reconstruction automatically updates whenever you click a pixel in the input grid. The network processes the current 3×3 pattern through the encoder (compressing to 3 bottleneck values), then through the decoder (reconstructing to 9 output values). The reconstruction grid shows grayscale values, the bottleneck bars (positioned above their numerical values) show the compressed code, and the reconstruction error shows how well the network matches the input.
Learning Rate Slider: Adjustable slider in the control panel that sets the learning rate for training (range: 0.01 to 0.5, default: 0.15). The current value is displayed next to the slider label. You can adjust this during training to experiment with different learning speeds. Higher values learn faster but may overshoot, lower values are more stable but slower.
Momentum Slider: Adjustable slider in the control panel that sets the momentum factor for training (range: 0 to 0.99, default: 0.8). The current value is displayed next to the slider label. You can adjust this during training to experiment with different momentum effects. Higher values maintain more velocity, helping escape local minima.
Train: Starts training the network on the dataset. Training uses gradient descent with momentum, using the current Learning Rate and Momentum values from the sliders. The network processes all training samples, updates weights after each sample, and continues for multiple epochs. Training continues until you click "Stop". The epoch counter and loss are displayed in real-time. Watch the network diagram to see weights changing (line thickness and colors) as the network learns to compress and reconstruct patterns.
Stop: Stops the training process. The network retains the weights learned so far. The "Train" button reappears, allowing you to resume training from where it stopped.
Reset: Reinitializes all network weights and biases with random values (He initialization). Also resets momentum velocities and the epoch counter to 0. Useful for starting fresh training. The current input pattern remains unchanged.

Key Concepts and Implementation

This simulation demonstrates AutoEncoder learning in a "glass box" architecture where every connection is visible. Here are the key concepts:

Deep AutoEncoder Architecture: 9 → 6 → 3 → 6 → 9: The complete pipeline processes 9 input pixels through a multi-layer encoder that gradually compresses: 9 → 6 → 3 (the bottleneck), then through a multi-layer decoder that gradually expands: 3 → 6 → 9. This creates an hourglass shape that clearly visualizes the compression and expansion process. The intermediate layers (6 neurons) make the compression/expansion more gradual and easier to understand. Total parameters: 162 - small enough to visualize every single weight, yet complex enough to learn meaningful compression patterns.
Encoder (Gradual Compression): The encoder has two layers that gradually compress the input:
- Encoder Hidden (9 → 6): First compression step, reduces from 9 to 6 neurons
- Bottleneck (6 → 3): Final compression, reduces from 6 to 3 neurons (the latent code)
Each layer computes weighted sums and applies sigmoid activation. The gradual compression helps the network learn more structured representations.
Bottleneck (Latent Code): The bottleneck is the most compressed representation - just 3 numbers that encode all information needed to reconstruct the 9-pixel pattern. This is the "latent code" or "latent representation." The bottleneck forces the network to find efficient encodings. Different patterns produce different bottleneck codes, and similar patterns produce similar codes. The bottleneck is visually highlighted in the network diagram with larger circles. Each of the 3 bottleneck values has practical meaning: after training, each neuron learns to detect different features (e.g., one detects horizontal patterns, another detects vertical patterns, the third detects diagonal/corner patterns). The values (0.0-1.0) indicate how strongly each feature is present. Together, the 3 values form a unique code that identifies the pattern. The visualization shows bars above their numerical values, growing from bottom to top, making it easy to see the compressed representation.
Decoder (Gradual Expansion): The decoder has two layers that gradually expand the bottleneck:
- Decoder Hidden (3 → 6): First expansion step, expands from 3 to 6 neurons
- Output (6 → 9): Final expansion, expands from 6 to 9 neurons (the reconstruction)
Each layer computes weighted sums and applies sigmoid activation. The gradual expansion mirrors the compression process in reverse.
Reconstruction Loss: Unlike classification networks, AutoEncoders use reconstruction loss (Mean Squared Error). The target is the input itself - the network tries to make the output match the input. Loss = sum of (input[i] - output[i])² for all pixels. Lower loss means better reconstruction.
Sigmoid Activation: Both encoder and decoder use sigmoid activation, which produces values between 0 and 1. This is perfect for binary pixel patterns (on/off) and makes the bottleneck values easy to visualize. Sigmoid(x) = 1 / (1 + e^(-x)).
Weight Visualization: Every connection is drawn in the network diagram showing all five layers arranged vertically (top to bottom). The hourglass shape is created by layer widths proportional to neuron count: Input (9) and Output (9) are widest, Bottleneck (3) is narrowest, and intermediate layers (6) are in between. Line thickness represents weight magnitude, color represents sign (blue=positive, red=negative). Only connections with |weight| > 0.3 are shown to reduce clutter. As training progresses, you can see: (1) How input pixels flow through the encoder hidden layer, (2) How the encoder hidden layer compresses to the bottleneck, (3) How the bottleneck expands to the decoder hidden layer, (4) How the decoder hidden layer expands to the output. The vertical hourglass layout makes it visually clear how information flows through the network, with compression (narrowing) from top to middle and expansion (widening) from middle to bottom.
Neuron Activation: Neurons are colored based on their activation level (darker = higher activation). Input and output neurons show binary values (0 or 1). Bottleneck neurons show continuous values (0 to 1). The bottleneck visualization uses bars to show activation height, making it easy to see the compressed code.
Compression Trade-off: The bottleneck creates a fundamental trade-off: smaller bottleneck = more compression but harder reconstruction. With only 3 neurons, the network must find efficient ways to encode patterns. Some information is inevitably lost, but the network learns to preserve the most important features.
What to Look For: After training, observe: (1) The hourglass shape in the network diagram showing gradual compression and expansion, (2) How the bottleneck encodes different patterns (similar patterns → similar codes), (3) How well the network reconstructs training patterns vs. new patterns, (4) Which input pixels flow through which encoder hidden neurons (thick connections), (5) How the bottleneck expands through decoder hidden neurons (thick connections), (6) The reconstruction error decreasing as training progresses, (7) How intermediate layers (6 neurons) create smoother compression/expansion than direct 9→3→9. This demonstrates how neural networks can learn efficient data representations through gradual compression and expansion without explicit programming.

Limitations

Toy scale: 9 pixels, ~162 parameters, and a 12-pattern training set are chosen for visualization. Real autoencoders use thousands to millions of parameters on high-dimensional data.
Memorizes, doesn’t generalize: with so few training patterns the network largely memorizes them, so brand-new patterns often reconstruct poorly — a feature for teaching the compression trade-off, not a robust model.
Plain autoencoder: this is a deterministic reconstruction autoencoder, not a variational (VAE) or denoising autoencoder, so the latent space is not regularized or generative.
Fixed sigmoid + MSE: all layers use sigmoid with mean-squared-error loss, suited to the binary on/off pixels here; other data would call for different activations and losses.