|
Deep AutoEncoder Tutorial - Multi-Layer Compression & Reconstruction
This interactive tutorial provides a visual simulation of a Deep AutoEncoder neural network with intermediate hidden layers. An AutoEncoder is a type of neural network that learns to compress (encode) input data into a compact representation (bottleneck/latent code) and then reconstruct (decode) the original input from that compressed representation. This demonstrates the fundamental concept of dimensionality reduction and feature learning.
The architecture processes 3×3 pixel patterns (9 pixels) through a multi-layer encoder that gradually compresses them: 9 → 6 → 3 (the bottleneck), then through a multi-layer decoder that gradually expands them: 3 → 6 → 9. This creates an hourglass shape that clearly visualizes the "narrowing down" and "expanding up" process. The network is trained to minimize the reconstruction error - the difference between the input and the reconstructed output. This tutorial uses a 9-6-3-6-9 architecture: 9 input neurons → 6 encoder hidden neurons → 3 bottleneck neurons → 6 decoder hidden neurons → 9 output neurons.
The training uses backpropagation with momentum on a dataset of simple 3×3 patterns (horizontal lines, vertical lines, diagonals, corners, cross). As the network trains, you can observe how it learns to compress different patterns through the narrowing layers into the 3-number bottleneck code, then expand them back through the widening layers. The network diagram shows all five layers and their connections, making the compression/expansion process visually clear.
NOTE : This simulation demonstrates Deep AutoEncoder learning in a "glass box" architecture where every weight and activation is visible. The 9-6-3-6-9 architecture with ~162 total parameters is small enough to visualize every connection, yet complex enough to learn meaningful compression patterns. The intermediate hidden layers (6 neurons) create a gradual compression/expansion that is easier to understand than direct 9→3→9 compression. The bottleneck (3 neurons) forces the network to find efficient representations. Refer to this note for theoretical details.
Usage Example
Follow these steps to explore how the AutoEncoder learns to compress and reconstruct patterns:
-
Initial State: When you first load the simulation, the network has random weights. The input grid is empty (all pixels off). Click on pixels in the 3×3 input grid to create a pattern. The reconstruction output will initially be poor (random gray values) because the network hasn't learned yet.
-
Test Before Training: Click pixels to create a pattern and observe:
- The input grid shows your pattern (black = on, white = off)
- The bottleneck bars (positioned above their numerical values) show the compressed representation (3 values between 0 and 1), growing from bottom to top
- The reconstruction grid shows the network's attempt to recreate your pattern (grayscale values)
- The reconstruction error shows how different the output is from the input
- The network diagram shows all connections in a vertical hourglass layout: Input (9) → Encoder Hidden (6) → Bottleneck (3) → Decoder Hidden (6) → Output (9)
- The Learning Rate and Momentum sliders allow you to adjust training parameters
With random weights, the reconstruction will be poor and the error will be high.
-
Start Training: Adjust the Learning Rate and Momentum sliders if desired (defaults: 0.15 and 0.8), then click the "Train" button. Watch as the network learns:
- The loss value decreasing over epochs (shown in the control panel)
- The network diagram updating to show weight changes in the vertical hourglass layout
- After training, test your input pattern again - the reconstruction should be much better
- The bottleneck values will encode meaningful information about the pattern (each value represents a different learned feature)
The network trains on a dataset of simple patterns (lines, diagonals, corners, cross). You can adjust Learning Rate and Momentum during training to experiment with different learning dynamics.
-
Observe Compression: After training, try different patterns and observe:
- How the bottleneck (3 numbers) encodes different patterns
- Similar patterns produce similar bottleneck codes
- The reconstruction quality improves significantly after training
- The reconstruction error decreases as the network learns
-
Test Reconstruction: Try drawing different patterns:
- Horizontal lines (all pixels in one row)
- Vertical lines (all pixels in one column)
- Diagonal patterns
- Corner patterns
- Cross pattern (center row and column)
- Dot (center pixel only)
- Custom patterns you create
Observe how well the network reconstructs patterns it was trained on vs. new patterns.
-
Observe the Hourglass Shape: The network diagram shows the hourglass architecture arranged vertically (top to bottom):
- Input layer (9 neurons) - widest at the top
- Encoder hidden layer (6 neurons) - narrowing down
- Bottleneck layer (3 neurons) - narrowest point in the middle (the compressed code)
- Decoder hidden layer (6 neurons) - expanding up
- Output layer (9 neurons) - widest at the bottom
Layer widths are proportional to neuron count, creating a clear visual hourglass that makes the compression and expansion process intuitive. The vertical layout shows data flowing from top (input) through compression (narrowing) to the bottleneck, then through expansion (widening) to the output at the bottom.
-
Understand the Bottleneck: The bottleneck is the key concept:
- It forces the network to compress 9 pixels into just 3 numbers
- The intermediate layers (6 neurons) create a gradual compression/expansion
- This compression must preserve enough information to reconstruct the pattern
- The bottleneck values (0-1) represent the compressed code
- Different patterns produce different bottleneck codes
- Each of the 3 values has practical meaning: after training, each neuron learns to detect different features (horizontal, vertical, diagonal/corner patterns)
- The visualization shows bars above their numerical values, growing from bottom to top, making it easy to see activation levels
-
Experiment with Parameters: Try adjusting the Learning Rate and Momentum sliders to see how they affect training:
- Higher Learning Rate (0.3-0.5): Faster learning but may overshoot optimal weights
- Lower Learning Rate (0.01-0.05): Slower but more stable convergence
- Higher Momentum (0.9-0.99): Faster convergence, helps escape local minima
- Lower Momentum (0.3-0.5): More conservative updates
-
Reset and Retry: Click "Reset" to reinitialize the network with random weights. Try training again and observe how different initializations can lead to different learned compression schemes, though the final reconstruction quality should be similar.
Tip: The key insight is that the bottleneck forces the network to learn a compact representation. After training, notice how the network can reconstruct training patterns well, but may struggle with completely new patterns. This demonstrates the fundamental trade-off in AutoEncoders: compression vs. reconstruction quality. The bottleneck visualization shows how the network encodes patterns into just 3 numbers.
Parameters
Followings are short descriptions on each parameters
-
Input Grid (3×3): A clickable 3×3 grid where you can toggle pixels on/off by clicking. Black pixels are "on" (value 1), white pixels are "off" (value 0). Click any pixel to toggle it. The grid automatically updates the reconstruction as you draw. This is the pattern you want the network to compress and reconstruct.
-
Deep AutoEncoder Architecture: The network has five layers creating an hourglass shape:
- Input Layer: 9 neurons (3×3 grid)
- Encoder Hidden Layer: 6 neurons (first compression step - narrowing down)
- Bottleneck Layer: 3 neurons (most compressed - the latent code)
- Decoder Hidden Layer: 6 neurons (first expansion step - expanding up)
- Output Layer: 9 neurons (3×3 reconstruction)
Total parameters: (9×6 + 6) + (6×3 + 3) + (3×6 + 6) + (6×9 + 9) = 54 + 21 + 24 + 63 = 162 weights and biases. This is small enough to visualize every connection clearly, and the intermediate layers make the compression/expansion process more intuitive.
-
Bottleneck (Compressed Code): The middle layer with 3 neurons. This is the compressed representation - the network must encode all information about the 9-pixel pattern into just 3 numbers (each between 0 and 1). The visualization shows three bars, one for each bottleneck neuron, positioned above their numerical values. Bars grow from bottom to top (like traditional bar charts), with height representing activation value. Green bars indicate high activation (>0.5), orange bars indicate low activation. Each bottleneck value has practical meaning: after training, each neuron learns to detect different features (e.g., horizontal patterns, vertical patterns, diagonal/corner patterns). The three values together form a unique code that identifies the input pattern.
-
Reconstruction Grid (3×3): The output of the decoder, attempting to recreate the input pattern. Cells are displayed in grayscale: darker = higher activation (closer to 1), lighter = lower activation (closer to 0). The border is highlighted (black) when activation > 0.5. This is read-only - you cannot edit it directly.
-
Reconstruction Error: Shows the Mean Squared Error (MSE) between the input and output. Lower values mean better reconstruction. Error = sum of (input[i] - output[i])² for all 9 pixels. Perfect reconstruction would have error = 0.
Network Visualization: The top panel shows the complete Deep AutoEncoder with all five layers arranged vertically (top to bottom): Input (9) → Encoder Hidden (6) → Bottleneck (3) → Decoder Hidden (6) → Output (9). The hourglass shape clearly shows the narrowing (compression) and widening (expansion) process, with layer widths proportional to the number of neurons. Input and Output layers (9 neurons) are widest, the Bottleneck (3 neurons) is narrowest, and intermediate layers (6 neurons) are in between. All connections are shown with line thickness representing weight magnitude and color representing sign (blue=positive, red=negative). Neuron colors represent activation levels (darker = higher activation). Only connections with |weight| > 0.3 are shown to reduce visual clutter. The bottleneck layer is highlighted with larger circles to emphasize its importance. The vertical layout makes the compression/expansion process visually intuitive.
-
Learning Rate: Adjustable slider (range: 0.01 to 0.5, default: 0.15) that controls the step size for weight updates during backpropagation. Higher values learn faster but may overshoot optimal weights. Lower values are more stable but slower. You can adjust this parameter in real-time using the slider in the control panel. Changes take effect immediately during training.
-
Momentum: Adjustable slider (range: 0 to 0.99, default: 0.8) that controls the momentum factor used in momentum-based gradient descent. Higher values maintain more velocity from previous updates, helping the network escape local minima and converge faster. Lower values (closer to 0) reduce momentum effects. You can adjust this parameter in real-time using the slider in the control panel. Changes take effect immediately during training.
-
Training Data: The network is trained on a dataset of simple 3×3 patterns: 3 horizontal lines, 3 vertical lines, 2 diagonal patterns, 2 corner patterns, 1 cross pattern, and 1 dot pattern. Total: 12 training samples. These patterns are shuffled each epoch.
Buttons and Controls
Followings are short descriptions on each control
-
Clear: Clears the 3×3 input grid, turning all pixels off. The reconstruction grid and bottleneck bars update immediately to show the network's response to an empty pattern.
-
Reconstruction: (Auto-updates) The reconstruction automatically updates whenever you click a pixel in the input grid. The network processes the current 3×3 pattern through the encoder (compressing to 3 bottleneck values), then through the decoder (reconstructing to 9 output values). The reconstruction grid shows grayscale values, the bottleneck bars (positioned above their numerical values) show the compressed code, and the reconstruction error shows how well the network matches the input.
-
Learning Rate Slider: Adjustable slider in the control panel that sets the learning rate for training (range: 0.01 to 0.5, default: 0.15). The current value is displayed next to the slider label. You can adjust this during training to experiment with different learning speeds. Higher values learn faster but may overshoot, lower values are more stable but slower.
-
Momentum Slider: Adjustable slider in the control panel that sets the momentum factor for training (range: 0 to 0.99, default: 0.8). The current value is displayed next to the slider label. You can adjust this during training to experiment with different momentum effects. Higher values maintain more velocity, helping escape local minima.
-
Train: Starts training the network on the dataset. Training uses gradient descent with momentum, using the current Learning Rate and Momentum values from the sliders. The network processes all training samples, updates weights after each sample, and continues for multiple epochs. Training continues until you click "Stop". The epoch counter and loss are displayed in real-time. Watch the network diagram to see weights changing (line thickness and colors) as the network learns to compress and reconstruct patterns.
-
Stop: Stops the training process. The network retains the weights learned so far. The "Train" button reappears, allowing you to resume training from where it stopped.
-
Reset: Reinitializes all network weights and biases with random values (He initialization). Also resets momentum velocities and the epoch counter to 0. Useful for starting fresh training. The current input pattern remains unchanged.
Key Concepts and Implementation
This simulation demonstrates AutoEncoder learning in a "glass box" architecture where every connection is visible. Here are the key concepts:
-
Deep AutoEncoder Architecture: 9 → 6 → 3 → 6 → 9: The complete pipeline processes 9 input pixels through a multi-layer encoder that gradually compresses: 9 → 6 → 3 (the bottleneck), then through a multi-layer decoder that gradually expands: 3 → 6 → 9. This creates an hourglass shape that clearly visualizes the compression and expansion process. The intermediate layers (6 neurons) make the compression/expansion more gradual and easier to understand. Total parameters: 162 - small enough to visualize every single weight, yet complex enough to learn meaningful compression patterns.
-
Encoder (Gradual Compression): The encoder has two layers that gradually compress the input:
- Encoder Hidden (9 → 6): First compression step, reduces from 9 to 6 neurons
- Bottleneck (6 → 3): Final compression, reduces from 6 to 3 neurons (the latent code)
Each layer computes weighted sums and applies sigmoid activation. The gradual compression helps the network learn more structured representations.
-
Bottleneck (Latent Code): The bottleneck is the most compressed representation - just 3 numbers that encode all information needed to reconstruct the 9-pixel pattern. This is the "latent code" or "latent representation." The bottleneck forces the network to find efficient encodings. Different patterns produce different bottleneck codes, and similar patterns produce similar codes. The bottleneck is visually highlighted in the network diagram with larger circles. Each of the 3 bottleneck values has practical meaning: after training, each neuron learns to detect different features (e.g., one detects horizontal patterns, another detects vertical patterns, the third detects diagonal/corner patterns). The values (0.0-1.0) indicate how strongly each feature is present. Together, the 3 values form a unique code that identifies the pattern. The visualization shows bars above their numerical values, growing from bottom to top, making it easy to see the compressed representation.
-
Decoder (Gradual Expansion): The decoder has two layers that gradually expand the bottleneck:
- Decoder Hidden (3 → 6): First expansion step, expands from 3 to 6 neurons
- Output (6 → 9): Final expansion, expands from 6 to 9 neurons (the reconstruction)
Each layer computes weighted sums and applies sigmoid activation. The gradual expansion mirrors the compression process in reverse.
-
Reconstruction Loss: Unlike classification networks, AutoEncoders use reconstruction loss (Mean Squared Error). The target is the input itself - the network tries to make the output match the input. Loss = sum of (input[i] - output[i])² for all pixels. Lower loss means better reconstruction.
-
Sigmoid Activation: Both encoder and decoder use sigmoid activation, which produces values between 0 and 1. This is perfect for binary pixel patterns (on/off) and makes the bottleneck values easy to visualize. Sigmoid(x) = 1 / (1 + e^(-x)).
-
Weight Visualization: Every connection is drawn in the network diagram showing all five layers arranged vertically (top to bottom). The hourglass shape is created by layer widths proportional to neuron count: Input (9) and Output (9) are widest, Bottleneck (3) is narrowest, and intermediate layers (6) are in between. Line thickness represents weight magnitude, color represents sign (blue=positive, red=negative). Only connections with |weight| > 0.3 are shown to reduce clutter. As training progresses, you can see: (1) How input pixels flow through the encoder hidden layer, (2) How the encoder hidden layer compresses to the bottleneck, (3) How the bottleneck expands to the decoder hidden layer, (4) How the decoder hidden layer expands to the output. The vertical hourglass layout makes it visually clear how information flows through the network, with compression (narrowing) from top to middle and expansion (widening) from middle to bottom.
-
Neuron Activation: Neurons are colored based on their activation level (darker = higher activation). Input and output neurons show binary values (0 or 1). Bottleneck neurons show continuous values (0 to 1). The bottleneck visualization uses bars to show activation height, making it easy to see the compressed code.
-
Compression Trade-off: The bottleneck creates a fundamental trade-off: smaller bottleneck = more compression but harder reconstruction. With only 3 neurons, the network must find efficient ways to encode patterns. Some information is inevitably lost, but the network learns to preserve the most important features.
-
What to Look For: After training, observe: (1) The hourglass shape in the network diagram showing gradual compression and expansion, (2) How the bottleneck encodes different patterns (similar patterns → similar codes), (3) How well the network reconstructs training patterns vs. new patterns, (4) Which input pixels flow through which encoder hidden neurons (thick connections), (5) How the bottleneck expands through decoder hidden neurons (thick connections), (6) The reconstruction error decreasing as training progresses, (7) How intermediate layers (6 neurons) create smoother compression/expansion than direct 9→3→9. This demonstrates how neural networks can learn efficient data representations through gradual compression and expansion without explicit programming.
NOTE : This simulation demonstrates Deep AutoEncoder learning in a completely transparent "glass box" architecture. The Deep AutoEncoder (9 → 6 → 3 → 6 → 9) demonstrates how neural networks can learn to compress data through gradual narrowing and reconstruct it through gradual expansion. The vertical hourglass architecture clearly visualizes the compression/expansion process, with layer widths proportional to neuron count, making it easier to understand than direct 9→3→9 compression. The bottleneck (3 neurons) forces the network to find efficient encodings of 9-pixel patterns, with each bottleneck value learning to detect different features after training. With 162 parameters, the network is small enough to visualize every single connection, making it perfect for understanding how AutoEncoders work. The adjustable Learning Rate and Momentum sliders allow real-time experimentation with training dynamics. After training, you can literally see: (1) The vertical hourglass shape showing gradual compression (top to middle) and expansion (middle to bottom), (2) How different input patterns produce different bottleneck codes (the 3 compressed values, visualized as bars above their numerical values), (3) How the encoder gradually compresses through intermediate layers (thick connections), (4) How the decoder gradually expands through intermediate layers (thick connections), (5) The reconstruction quality improving as the network learns. This demonstrates the fundamental concept behind AutoEncoders used in dimensionality reduction, denoising, and generative models: the network automatically learns efficient data representations through gradual compression and expansion without explicit programming. The bottleneck creates a trade-off between compression and reconstruction quality - smaller bottlenecks compress more but may lose information. In practice, real-world AutoEncoders use much larger networks with deeper architectures and learnable compression schemes, but the core principle remains the same: encode gradually, compress, decode gradually, reconstruct.
|
|