Web Simulator | ShareTechnote

Web Simulation

CNN Convolution & Pooling Tutorial - Step-by-Step Visualization

This interactive tutorial provides a step-by-step visualization of how Convolution and Max Pooling operations work in Convolutional Neural Networks (CNNs). Watch as an 8×8 input image is processed through a 3×3 convolution kernel to produce a 6×6 feature map, then through 2×2 max pooling to produce a 3×3 output. This is a standalone visualization tool focused entirely on the mechanics of Convolution and Pooling—no training logic, just pure matrix operations with animation steps.

The tutorial shows each operation in detail: you can see exactly which input pixels contribute to each convolution output, how the kernel slides across the input, how ReLU activation is applied, and how max pooling selects the maximum value from each 2×2 window. Use the step controls to move forward or backward through the process, or click "Run" to watch the complete animation. You can also draw your own patterns or select from predefined patterns (vertical lines, horizontal lines, cross, box, random).

Pure visualization: this tool shows the mechanics of CNN operations — the convolution kernel is fixed (a simple edge/feature detector), with every step shown mathematically. There is no training. Refer to this note for CNN theory.

Sections

Mathematical Foundation
Simulation
Usage Example
Parameters
Buttons and Controls
Key Concepts and Implementation
Limitations

Mathematical Foundation

The image passes through three operations that shrink it while extracting structure:

Stage	Operation	Output size
Input	Raw pixels	8×8 = 64
Convolution + ReLU	3×3 kernel, valid padding	6×6 = 36
Max pooling	2×2, stride 2	3×3 = 9

Convolution computes a dot product of the kernel with each 3×3 patch, then ReLU discards negatives (keeping only positive feature responses):

y[i,j] = ReLU( Σ_m,n x[i+m, j+n] · h[m,n] ) ReLU(z) = max(0, z)

Max pooling keeps only the strongest response in each 2×2 block, giving translation tolerance and halving each dimension:

p[i,j] = max( y[2i, 2j], y[2i+1, 2j], y[2i, 2j+1], y[2i+1, 2j+1] )

Simulation

The interactive simulator is below. Use the controls to explore the concepts described above.

Load Pattern: Kernel: Conv Stride: Pool Stride:

Speed: 200ms

Input (8×8)

➜

Convolution

Kernel (3×3)

➜

Conv Output (6×6)

➜

Max Pool

2×2, Stride 2

➜

Pooled (3×3)

Current Operation

Click 'Step Fwd' or 'Run' to start...

Usage Example

Follow these steps to explore how Convolution and Max Pooling operations work:

Initial State: When you first load the tutorial, a vertical line pattern is loaded by default. The 8×8 input grid shows the pattern, and the convolution and pooling outputs are empty. You can click on pixels in the 8×8 grid to create your own pattern, or select a predefined pattern from the dropdown menu.
Step Through Operations: Click "Step Fwd" to move forward one step at a time. You'll see:
- Yellow highlights on the input grid showing which 3×3 region is being convolved
- Green highlight on the convolution output cell being calculated
- The mathematical formula in the bottom panel showing the convolution calculation
- Values appearing in the 6×6 convolution output grid
Watch Convolution: As you step through the convolution phase (36 steps), observe:
- How the 3×3 kernel slides across the 8×8 input (left to right, top to bottom)
- The element-wise multiplication: each input pixel is multiplied by the corresponding kernel value
- The sum of all products, which becomes the convolution output
- ReLU activation: negative values become 0
Watch Max Pooling: After convolution completes, the pooling phase begins (9 steps). Observe:
- Yellow highlights on the 6×6 convolution output showing which 2×2 window is being pooled
- Green highlight on the 3×3 pooled output cell being calculated
- The mathematical formula showing the max operation
- How the maximum value from each 2×2 window is selected
Run Animation: Click "Run" to watch the complete process automatically. Adjust the speed slider to control animation speed (50ms = fast, 1000ms = slow). Click "Pause" to stop at any point.
Try Different Patterns: Select different patterns from the dropdown:
- Vertical Line: See how vertical structures are detected
- Horizontal Line: See how horizontal structures are detected
- Cross: See how both vertical and horizontal features combine
- Square Box: Hollow box outline - see how edge detection works on enclosed regions
- Letter 'A': Classic pixel art letter A - demonstrates complex shape recognition
- Letter 'X': Diagonal cross pattern - shows diagonal edge detection
- Letter 'O': Circular/oval outline - demonstrates curved edge detection
- Triangle: Hollow pyramid shape - shows angular edge detection
- Diamond: Hollow diamond shape - demonstrates multi-directional edges
- Plus Sign: Thick plus pattern - shows intersection detection
- Arrow: Right-pointing arrow - demonstrates directional pattern detection
- Checkmark: Checkmark pattern - shows curved and diagonal features
- Random Noise: See how random patterns are filtered
- Clear: Start with a blank grid and draw your own pattern
Experiment with Kernels: Use the Kernel dropdown to select different convolution kernels:
- Edge Detector (default): Detects edges and local patterns
- Blur: Smoothing/averaging kernel
- Sharpen: Enhances edges and details
- Vertical Edge: Specifically detects vertical edges
- Horizontal Edge: Specifically detects horizontal edges
Each kernel produces different results on the same input pattern, demonstrating how different kernels extract different features.
Adjust Stride Values: Use the Conv Stride and Pool Stride dropdowns to change how the operations move:
- Conv Stride: Controls how far the kernel moves between convolution operations (1, 2, or 3). Higher stride = smaller output grid.
- Pool Stride: Controls how far the pooling window moves (1 or 2). Higher stride = smaller output grid.
Changing stride values automatically recalculates the output grid dimensions. Notice how stride affects the size of the final feature map.
Observe Gradient Colors: The convolution and pooling output grids use gradient color coding:
- Light blue = low values (weak activations)
- Dark blue = high values (strong activations)
- The gradient makes it easy to see which regions have the strongest feature responses
Step Backward: Use "Step Back" to go backward through the operations. This is useful for reviewing specific steps or understanding how a particular output value was calculated.
Reset: Click "Reset" to clear all outputs and return to the initial state. The input pattern remains, but all convolution and pooling calculations are cleared.

Tip: Pay attention to the mathematical formulas in the bottom panel — they show exactly how each output value is calculated. Notice how the convolution kernel emphasizes certain patterns (edges, corners) and how max pooling reduces dimensionality while preserving the strongest activations.

Parameters

Followings are short descriptions on each parameters

Input Grid (8×8): A clickable 8×8 grid where you can toggle pixels on/off by clicking. Dark pixels are "on" (value 1), white pixels are "off" (value 0). Click any pixel to toggle it. The grid is locked during animation (when currentStep ≥ 0). You can also select predefined patterns from the dropdown menu.
Convolution Kernel (3×3): A selectable 3×3 kernel displayed between the input and convolution output. You can choose from several predefined kernels:
- Edge Detector (default):
```
[0 1 0]
[1 2 1]
[0 1 0]
```
  - Detects edges and local patterns
- Blur:
```
[1 1 1]
[1 1 1]
[1 1 1]
```
  - Smoothing/averaging kernel
- Sharpen:
```
[0 -1 0]
[-1 5 -1]
[0 -1 0]
```
  - Enhances edges
- Vertical Edge:
```
[-1 0 1]
[-1 0 1]
[-1 0 1]
```
  - Detects vertical edges
- Horizontal Edge:
```
[-1 -1 -1]
[0 0 0]
[1 1 1]
```
  - Detects horizontal edges
The kernel slides across the input, performing element-wise multiplication and summing the results. Changing the kernel changes what features are detected.
Convolution Stride: Controls how far the kernel moves between convolution operations. Options: 1 (default), 2, or 3. With stride 1, the kernel moves one pixel at a time. With stride 2, it moves two pixels, producing a smaller output. The output dimension is calculated as: floor((8 - 3)/stride) + 1. Changing stride automatically updates the output grid size.
Convolution Output: The result of applying the 3×3 convolution kernel to the 8×8 input. The output size depends on the stride (e.g., 6×6 with stride 1, 3×3 with stride 2, 2×2 with stride 3). Each output cell is calculated by:
- Placing the kernel over a 3×3 region of the input
- Multiplying each input pixel by the corresponding kernel value
- Summing all products
- Applying ReLU activation: max(0, sum) - negative values become 0
The kernel slides from top-left to bottom-right according to the stride setting. Output cells are color-coded with a blue gradient: light blue for low values, dark blue for high values, making it easy to see activation strength.
Pooling Stride: Controls how far the pooling window moves between pooling operations. Options: 1 or 2 (default). With stride 2, windows don't overlap. With stride 1, windows overlap by one pixel. The output dimension is calculated as: floor((convDim - 2)/stride) + 1. Changing stride automatically updates the output grid size.
Max Pooling (2×2): The convolution output is downsampled using 2×2 max pooling. This means:
- Divide the 6×6 grid into non-overlapping 2×2 windows
- For each window, select the maximum value
- Place that maximum value in the corresponding position in the 3×3 output
This reduces the convolution output grid size while preserving the strongest activations. The exact reduction depends on the convolution output size and pooling stride. Output cells are color-coded with the same blue gradient as the convolution output, making it easy to compare activation strengths.
Animation Speed: Controls how fast the step-by-step animation runs when you click "Run". Range: 50ms (very fast) to 1000ms (slow). Default: 200ms. You can adjust this even while the animation is running.
Mathematical Display: The bottom panel shows the mathematical formula for the current operation. During convolution, it shows the element-wise products and their sum. During pooling, it shows the values in the 2×2 window and the selected maximum.

Buttons and Controls

Followings are short descriptions on each control

Load Pattern: Dropdown menu to select predefined patterns. Options include: Vertical Line, Horizontal Line, Cross, Square Box (hollow), Letter 'A', Letter 'X', Letter 'O', Triangle (hollow), Diamond (hollow), Plus Sign, Arrow, Checkmark, Random Noise, or Clear (blank grid). All patterns are designed using pixel art principles for clear recognition on the 8×8 grid. Selecting a pattern automatically resets the simulation and loads the new pattern.
Kernel: Dropdown menu to select the convolution kernel. Options: Edge Detector (default), Blur, Sharpen, Vertical Edge, or Horizontal Edge. Each kernel detects different types of features. Changing the kernel automatically updates the kernel display and resets the simulation.
Conv Stride: Dropdown to select convolution stride (1, 2, or 3). Controls how far the kernel moves between operations. Changing stride recalculates output dimensions and resets the simulation.
Pool Stride: Dropdown to select pooling stride (1 or 2). Controls how far the pooling window moves. Changing stride recalculates output dimensions and resets the simulation.
Step Back (❮): Moves backward one step in the animation. Useful for reviewing previous operations or understanding how a value was calculated. If at the beginning (step -1), this button has no effect.
Run (▶): Starts automatic animation of all operations. The animation will step through all 36 convolution steps, then all 9 pooling steps. The animation automatically stops at the end. Click "Pause" to stop manually.
Pause (❚❚): Stops the automatic animation. The current step is preserved, and you can continue with "Step Fwd" or "Step Back" manually, or click "Run" again to resume automatic animation.
Step Fwd (❯): Moves forward one step in the animation. Each step shows one operation: either one convolution calculation or one pooling calculation. The highlights and mathematical formulas update to show the current operation.
Reset: Clears all convolution and pooling outputs, resets the current step to -1 (idle state), and stops any running animation. The input pattern remains unchanged. Useful for starting over with the same input pattern.
Speed Slider: Controls the animation speed when using "Run". Range: 50ms (very fast) to 1000ms (slow). The value represents milliseconds between steps. You can adjust this even while animation is running.

Key Concepts and Implementation

This tutorial demonstrates the fundamental operations of Convolutional Neural Networks: Convolution and Max Pooling. Here are the key concepts:

Convolution Operation: Convolution is a mathematical operation that slides a small kernel (filter) across an input image, computing the dot product at each position. In this tutorial:
- The 3×3 kernel slides across the 8×8 input
- At each position, the kernel is element-wise multiplied with the overlapping 3×3 region
- All products are summed to produce one output value
- ReLU activation is applied: negative values become 0
- This produces a 6×6 output (8-3+1 = 6 in each dimension)
Convolution extracts local features like edges, corners, and patterns. The kernel acts as a feature detector.
Kernel (Filter): The kernel is a small matrix of weights that defines what features to detect. In this tutorial, you can select from several predefined kernels:
- Edge Detector: Emphasizes the center pixel and neighbors, detecting local patterns and edges
- Blur: Averages neighboring pixels, creating a smoothing effect
- Sharpen: Enhances edges by subtracting neighboring values from a weighted center
- Vertical Edge: Specifically detects vertical edges by comparing left and right neighbors
- Horizontal Edge: Specifically detects horizontal edges by comparing top and bottom neighbors
Each kernel produces different feature maps from the same input, demonstrating how different kernels extract different information. In real CNNs, kernels are learned during training, but here we use fixed kernels for demonstration.
Stride: Stride controls the step size of the sliding window. For convolution, stride determines how far the kernel moves between operations. For pooling, stride determines how far the pooling window moves. Higher stride values produce smaller output grids but reduce computation. The relationship is: output_size = floor((input_size - window_size)/stride) + 1. This tutorial allows you to experiment with different stride values to see how they affect the output dimensions.
Gradient Color Coding: The convolution and pooling output grids use a blue gradient to visualize activation strength:
- Light blue (#f0f8ff) = low values (0-1) - weak activations
- Medium blue (#50a8ff) = medium values (5-6) - moderate activations
- Dark blue (#0044cc) = high values (10+) - strong activations
This visual encoding makes it immediately clear which regions of the feature map have the strongest responses to the input pattern.
ReLU Activation: After convolution, ReLU (Rectified Linear Unit) is applied: ReLU(x) = max(0, x). This means:
- Positive values pass through unchanged
- Negative values become 0
ReLU introduces non-linearity, which is essential for neural networks to learn complex patterns. It also helps with gradient flow during training.
Max Pooling: Pooling is a downsampling operation that reduces the spatial dimensions of the feature map. Max pooling:
- Divides the input into non-overlapping windows (2×2 in this case)
- Selects the maximum value from each window
- Places that maximum in the output
- Reduces 6×6 to 3×3 (stride 2 means windows don't overlap)
Max pooling reduces computation, provides translation invariance (the feature can appear anywhere in the window), and preserves the strongest activations.
Step-by-Step Visualization: The tutorial shows each operation individually:
- Yellow highlights (semi-transparent) show the input region being processed (3×3 for convolution, 2×2 for pooling). The transparency allows you to see the underlying pattern.
- Green highlights show the output cell being calculated
- Mathematical formulas show the exact calculation being performed, including all 9 multiplication terms for convolution
- Values appear in the output grids as they are calculated, with gradient color coding to show activation strength
- Dynamic grid sizes adjust automatically when you change stride values
This step-by-step approach makes it easy to understand how each output value is computed.
Why These Operations Matter: Convolution and pooling are the building blocks of CNNs:
- Convolution extracts local features (edges, textures, patterns) from raw pixels
- Pooling reduces dimensionality, making the network more efficient and providing translation invariance
- Together, they transform raw pixels into meaningful features that can be classified
- In real CNNs, multiple layers of convolution and pooling create hierarchical features (simple edges → complex shapes → objects)
What to Observe: As you step through the operations, notice:
- How the convolution kernel emphasizes certain patterns in the input
- How different input patterns produce different convolution outputs
- How max pooling preserves the strongest activations while reducing size
- How the final 3×3 output captures essential features from the original 8×8 input
This demonstrates how CNNs automatically extract meaningful features from images without explicit programming.

Limitations

No training: this is a pure mechanics visualization — the kernel is fixed (a hand-picked edge detector), not learned. Real CNNs learn many kernels per layer during training.
Single layer, single filter: one convolution + one pooling stage with one grayscale channel. Real networks stack many conv/pool layers with multiple channels and filters.
Fixed configuration: 8×8 input, 3×3 kernel with valid padding, and 2×2 max pooling with stride 2 — no padding, stride, or pooling-type options.
ReLU + max-pool only: other activations (sigmoid, GELU) and pooling types (average, global) are not shown.