Web Simulator | ShareTechnote

Web Simulation

2D Convolution Visualization Tutorial

This interactive tutorial demonstrates how 2D convolution works through an animated visualization. 2D convolution is a fundamental operation in image processing and deep learning (especially convolutional neural networks for computer vision). The tutorial shows how a kernel (filter) slides across a 2D input matrix, computing the dot product at each position to produce an output matrix.

The visualization consists of three main panels arranged horizontally: (1) Input Matrix (left) - a 32×32 grid showing the selected input pattern in blue shades (image-like display without text values), (2) Kernel (center) - a 3×3 grid showing the filter weights in red/orange shades, and (3) Output Matrix (right) - a 30×30 grid that fills pixel-by-pixel as the kernel scans (image-like display without text values). A yellow highlight box (3×3) slides over the Input Grid, and simultaneously the corresponding pixel in the Output Grid lights up. The visualization uses a dark theme (black background) with bright colors for optimal visibility. The larger grid size (32×32) allows for more detailed patterns and is suitable for real image processing applications.

You can select from different input patterns (Face, Camera, Text "A") or load your own image file. You can also select from various kernels including Basic Filters (Edge Detection/Sobel, Box Blur, Sharpen, Identity), Professional Filters (Laplacian, Emboss, Sharpen Pro, Scharr, Mean Removal), and Large Scale Effects (Bokeh, Double Vision, Long Motion Blur). The animation can be controlled with step forward/backward buttons (Bwd/Fwd), a play/pause button (Run), a reset button, and an animation delay slider. The output accumulates as the kernel slides, making it easy to see how each position contributes to the final result. A math panel below shows the detailed calculation for the current step, displaying all multiplication terms and the final sum.

NOTE : The tutorial uses grid heatmaps to visualize 2D matrices, where cell brightness/color represents pixel values. The convolution operation is computed using the standard 2D discrete convolution formula: y[i,j] = Σ_mΣ_n x[i+m, j+n] × h[m, n], where x is the input matrix, h is the kernel, and y is the output. The tutorial uses "Valid" padding (no padding), so the output is smaller than the input (10×10 input with 3×3 kernel produces 8×8 output). The kernel slides row by row, column by column, computing one output pixel at a time.

Mathematical Model

2D Convolution is a mathematical operation that combines two 2D matrices to produce a third matrix:

2D Discrete Convolution Formula:

y[i,j] = Σ_mΣ_n x[i+m, j+n] × h[m, n]

where:

x[i,j]: Input matrix (2D image/pixel values)
h[m,n]: Kernel or filter (2D matrix, typically 3×3 or 5×5)
y[i,j]: Output matrix (convolution result)
m, n: Summation indices over kernel dimensions

Operation: At each output position (i,j), the kernel is placed over the input matrix starting at position (i,j), then multiplied element-wise with the overlapping portion of the input. The sum of all products gives the output value at position (i,j). The kernel slides from top-left to bottom-right, computing one output pixel at a time.

Output Size: With "Valid" padding (no padding), if the input is I×J and the kernel is M×N, the output is (I-M+1)×(J-N+1). For a 32×32 input and 3×3 kernel, the output is 30×30. For a 32×32 input and 7×7 kernel, the output is 26×26. This accounts for the kernel needing to fully overlap with the input at each position. When loading an image file, the image is automatically resized to 32×32 pixels and converted to grayscale for processing.

Usage Example

Follow these steps to explore the 2D Convolution tutorial:

Initial State: When you first load the simulation, you'll see three panels arranged horizontally: Input Matrix (left, 32×32 grid in blue), Kernel (center, 3×3 grid in red), and Output Matrix (right, 30×30 grid in green). The default input pattern is Face, and the default kernel is Edge Detection (Sobel). The animation is paused, showing the initial state with no output computed yet.
Observe Input Matrix: The left panel shows the input matrix as a 32×32 grid. Each cell displays a pixel value through color intensity (no text labels - image-like display). Brightness represents the value (brighter = higher value). The default Face pattern shows a circular head with two eyes and a mouth. The grid uses blue shades to represent pixel values. The larger size (32×32) allows for more detailed patterns and is suitable for real image processing.
Observe Kernel: The center panel shows the kernel (filter) as a 3×3 grid. Each cell displays a kernel weight value. The default Sobel kernel is used for edge detection, with values like [-1, 0, 1, -2, 0, 2, -1, 0, 1]. The grid uses red/orange shades, with positive values in bright red and negative values in darker red/purple.
Run the Animation: Click the "▶ Run" button to start the animation. A yellow highlight box (3×3) will slide over the Input Matrix, moving row by row, column by column. Simultaneously, the corresponding pixel in the Output Matrix will light up in green. The math panel below shows the detailed calculation for each step, displaying all 9 multiplication terms and the final sum.
Step Through Manually: Use the "Fwd ⏭" and "⏮ Bwd" buttons to move through the convolution one pixel at a time. This allows you to carefully observe how each output value is computed. Watch the yellow highlight box move over the input, and see the corresponding output pixel appear. The math panel updates to show the exact calculation for the current step. Both step buttons will pause the animation if it's currently playing.
Adjust Animation Speed: Use the "Animation Delay (ms)" slider to control the speed of the animation. The slider ranges from 0 to 200 milliseconds. Lower values make the animation faster, higher values make it slower. This allows you to find a comfortable speed for observing the convolution process. The current delay value is displayed next to the slider.
Try Different Input Patterns: Use the "Input Pattern" dropdown to select different realistic image patterns:
- Face (32×32) (default): A simple face pattern with circular head, two eyes, and a mouth - demonstrates how edge detection kernels identify facial features
- Camera (32×32): A camera shape with rectangular body, circular lens, viewfinder, and flash - shows how filters respond to geometric shapes and edges
- Text "A" (32×32): A bold letter "A" with high contrast - ideal for observing how edge detection and sharpening kernels enhance text features
Each pattern demonstrates different aspects of 2D convolution with recognizable real-world shapes. The Face pattern is particularly useful for understanding how kernels detect edges and features in images.
Load Your Own Image: Click the "Load Image" button to load your own image file. The image will be automatically resized to 32×32 pixels and converted to grayscale. This allows you to test how different kernels affect your own images. Supported image formats include JPEG, PNG, and other common image formats. After loading, you can apply any of the available kernels to see how they transform your image.
Try Different Kernels: Use the "Kernel" dropdown to select different filters organized into categories:
- Basic Filters:
  - Edge Detection (Sobel) (default): Detects vertical edges - highlights transitions from dark to light
  - Box Blur: Uniform smoothing filter - averages neighboring pixels to create blur effect
  - Sharpen: Enhances edges and details - makes images appear sharper
  - Identity: No change filter - passes input through unchanged (useful for understanding the operation)
- Professional Filters (3×3):
  - Laplacian (Edge): Isotropic edge detection - detects edges in all directions
  - Emboss (3D): Creates 3D shadow effect - gives images a raised appearance
  - Sharpen (Pro): Professional sharpening - enhances contrast and details
  - Scharr (Edge): Optimized edge detection with better rotational symmetry than Sobel
  - Mean Removal: Removes average color to highlight fine details
- Large Scale Effects (7×7):
  - Bokeh (Disc Blur): Simulates out-of-focus camera lens with circular blur pattern
  - Double Vision: Creates ghosting effect by shifting and superimposing the image
  - Long Motion Blur: Simulates fast movement with diagonal streak effect
Each kernel produces different effects. Edge detection kernels produce high values at edges and low values in uniform regions. Blur kernels smooth out variations. Large scale effects (7×7) require a larger receptive field and produce more dramatic visual effects.
Observe the Output: The right panel shows the convolution result accumulating pixel-by-pixel. Notice how the output is smaller than the input (30×30 vs 32×32) because we use "Valid" padding (no padding). The output fills from top-left to bottom-right, row by row. Each output pixel corresponds to one position of the kernel sliding over the input. Like the input, the output displays pixel values through color intensity only (no text labels - image-like display).
Understand the Highlight Box: The yellow highlight box (3×3) shows the current overlap window where the kernel is positioned. This is where the kernel overlaps with the input matrix. At each position, the convolution multiplies corresponding values (9 multiplications) and sums them. Watch how the box moves and how the output value changes based on what's inside the box.
Read the Math Panel: The math panel below the grids shows the detailed calculation for the current step. It displays all 9 multiplication terms in the format "(input × kernel) = product", then sums them to get the final output value. This helps you understand exactly how each output pixel is computed.
Reset and Experiment: Click "Reset" to clear the output and start over. Try different combinations of patterns and kernels to see how they interact. For example, try Edge Detection kernel on a Square pattern to see how it detects the edges of the square, or try Box Blur on Random Noise to see the smoothing effect.

Tip: The key to understanding 2D convolution is recognizing that it's a 2D sliding dot product. At each output position (i,j), the kernel is placed over the input starting at (i,j), then multiplied element-wise with the overlapping 3×3 region, and all 9 products are summed. The visualization shows the kernel sliding row by row, column by column, computing one output pixel at a time. Start with simple patterns (Square) and simple kernels (Identity) to build intuition, then try more complex combinations. Use step-by-step mode to carefully observe how each output pixel is computed. The color intensity helps you understand the magnitude of values - brighter cells have higher absolute values.

Parameters

Followings are short descriptions on each parameter

Input Matrix (x[i,j]): The 2D input matrix (32×32 grid) that the kernel will slide across. The matrix represents a realistic image pattern, with each cell containing a normalized grayscale pixel value (0.0 = black, 1.0 = white). Available pattern types include Face (circular head with eyes and mouth), Camera (rectangular body with lens and viewfinder), and Text "A" (bold letter with high contrast). You can also load your own image file, which will be automatically resized to 32×32 pixels and converted to grayscale. The matrix is displayed as a grid heatmap in blue shades, where cell brightness represents pixel value (brighter = higher value). No text values are shown in cells - only color intensity (image-like display), making it suitable for real image processing. The visualization uses a dark theme (black background) for better contrast.
Kernel (h[m,n]): The 2D filter that slides across the input matrix. The kernel size is dynamic: 3×3 for basic and professional filters, 7×7 for large scale effects. The kernel is a matrix of weights used to detect features or apply transformations. Available kernel types are organized into three categories: (1) Basic Filters - Edge Detection/Sobel (detects vertical edges), Box Blur (uniform smoothing/averaging), Sharpen (enhances edges and details), Identity (no change); (2) Professional Filters (3×3) - Laplacian (isotropic edge detection), Emboss (3D shadow effect), Sharpen Pro (professional sharpening), Scharr (optimized edge detection), Mean Removal (highlights fine details); (3) Large Scale Effects (7×7) - Bokeh (disc blur), Double Vision (ghosting effect), Long Motion Blur (diagonal streak). The kernel is displayed as a grid in red/orange shades, with positive values in bright red and negative values in darker red/purple. Each cell shows the kernel weight value with numbers visible for clarity.
Output Matrix (y[i,j]): The result of the 2D convolution operation. The output size depends on the kernel size: 30×30 for 3×3 kernels (32-3+1=30), 26×26 for 7×7 kernels (32-7+1=26). The output is smaller than the 32×32 input because we use "Valid" padding with no padding. With Valid padding, if the input is I×J and the kernel is M×N, the output is (I-M+1)×(J-N+1). The output accumulates pixel-by-pixel as the kernel slides across the input, filling from top-left to bottom-right, row by row. The output is displayed as a grid in green shades, with each computed pixel shown as it's calculated. Active pixels (already computed) are shown in green, with brightness indicating value magnitude. No text values are shown in cells - only color intensity (image-like display), making it suitable for real image processing. For edge detection kernels, output values are displayed as absolute values (magnitude). For stylistic filters, values are clamped between 0 and 1 for proper visualization.
2D Convolution Formula: The discrete 2D convolution is computed as y[i,j] = Σ_mΣ_n x[i+m, j+n] × h[m, n], where the double summation is over all kernel positions (m, n). At each output position (i,j), the kernel is placed over the input starting at position (i,j), multiplied element-wise with the overlapping 3×3 region, and all 9 products are summed. This operation is repeated for all output positions, with the kernel sliding row by row, column by column.
Highlight Box (Overlap Window): The yellow highlight box shows the current overlap region where the kernel is positioned over the input matrix. The box size matches the kernel size: 3×3 for basic and professional filters, 7×7 for large scale effects. This window moves from top-left to bottom-right as the convolution progresses, row by row. The output value at the current position is computed from the values inside this window (9 values for 3×3 kernels, 49 values for 7×7 kernels). The highlight box makes it easy to see exactly which input pixels are being used in the current calculation.
Valid Padding: The tutorial uses "Valid" padding (no padding), which means no zeros are added around the input matrix. This results in an output that is smaller than the input. For a 32×32 input and 3×3 kernel, the output is 30×30. This is easier to understand than "Same" padding (which adds zeros) because you don't have to explain why zeros are added. The kernel must fully overlap with the input at each position, so the output size is reduced by (kernel_size - 1) in each dimension.
Math Panel: The math panel below the grids displays the detailed calculation for the current step. For 3×3 kernels, it shows all 9 multiplication terms in the format "(input_value × kernel_value) = product", then sums them to show the final output value. For 7×7 kernels (49 terms), it shows the first 9 terms followed by a summary indicating how many more terms are included. This helps you understand exactly how each output pixel is computed from the input pixels and kernel weights. The panel updates in real-time as you step through the animation.
Animation Delay: A slider control that adjusts the speed of the animation by setting the delay between steps. The slider ranges from 0 to 200 milliseconds. Lower values make the animation faster (0ms = instant), higher values make it slower (200ms = more deliberate). The current delay value is displayed next to the slider. This allows you to find a comfortable speed for observing the convolution process, whether you want to see it quickly or study each step carefully.

Controls and Visualizations

Followings are short descriptions on each control

Input Pattern Dropdown: Selects the type of input pattern to use. Available options include Face (32×32) - a simple face pattern with circular head, two eyes, and a mouth; Camera (32×32) - a camera shape with rectangular body, circular lens, viewfinder, and flash; and Text "A" (32×32) - a bold letter "A" with high contrast. Changing the pattern resets the animation and recalculates the convolution result. Each pattern type demonstrates different aspects of 2D convolution behavior with recognizable real-world shapes.
Load Image Button: Allows you to load your own image file. Click the button to open a file dialog, then select an image file (JPEG, PNG, or other common formats). The image will be automatically resized to 32×32 pixels and converted to grayscale for processing. After loading, you can apply any of the available kernels to see how they transform your image. Loading an image resets the animation and recalculates the convolution result.
Kernel Dropdown: Selects the type of kernel (filter) to use. The dropdown is organized into three categories: (1) Basic Filters - Edge Detection/Sobel (detects vertical edges), Box Blur (uniform smoothing), Sharpen (enhances edges), Identity (no change); (2) Professional Filters (3×3) - Laplacian (isotropic edge detection), Emboss (3D shadow effect), Sharpen Pro (professional sharpening), Scharr (optimized edge detection), Mean Removal (highlights fine details); (3) Large Scale Effects (7×7) - Bokeh (disc blur), Double Vision (ghosting effect), Long Motion Blur (diagonal streak). Changing the kernel resets the animation and recalculates the convolution result. Each kernel produces different effects on the input pattern. Edge detection kernels highlight transitions, blur kernels create smoothing effects, and large scale effects produce dramatic visual transformations.
Step Back Button (⏮ Bwd): Moves the animation backward by one pixel. This allows you to carefully observe how each output pixel is computed. The button pauses the animation if it's currently playing. Useful for detailed analysis of the convolution process.
Play/Pause Button (▶ Run / ❚❚ Pause): Starts or pauses the continuous animation. When playing, the highlight box slides across the input matrix automatically, and the output accumulates pixel-by-pixel. The button text changes to "❚❚ Pause" when playing. Clicking again pauses the animation. The animation stops automatically when it reaches the end (all output pixels computed, e.g., 900 pixels for 30×30 output).
Step Forward Button (Fwd ⏭): Moves the animation forward by one pixel. The button has an icon (⏭) on top and text ("Fwd") at the bottom. This allows you to step through the convolution manually, observing each computation step. The button pauses the animation if it's currently playing. Useful for understanding the convolution process in detail.
Animation Delay Slider: Controls the speed of the animation by adjusting the delay between steps. The slider ranges from 0 to 200 milliseconds. Lower values make the animation faster, higher values make it slower. The current delay value is displayed next to the slider. This allows you to find a comfortable speed for observing the convolution process.
Reset Button: Clears the output and resets the animation to the initial state (step 0). The output grid is cleared, and the animation is paused. This allows you to start over with the current pattern and kernel settings.
Input Matrix Panel (Left): Displays the input matrix as a 32×32 grid in blue shades. Each cell shows a pixel value through color intensity (no text labels - image-like display), with brightness representing the value magnitude. The grid is static and doesn't change during the animation. A yellow highlight box (size matches kernel size: 3×3 for basic/professional filters, 7×7 for large scale effects) moves over this grid to show the current kernel position. This panel helps you see the input pattern clearly before convolution. The larger size (32×32) allows for more detailed patterns and is suitable for real image processing.
Kernel Panel (Center): Displays the kernel as a grid in red/orange shades. The grid size matches the selected kernel: 3×3 for basic and professional filters, 7×7 for large scale effects. Each cell shows a kernel weight value with numbers visible. Positive values are shown in bright red, negative values in darker red/purple. The kernel is static and doesn't change during the animation. This panel helps you see the kernel weights that will be applied during convolution. The cells are larger and square-shaped for better visibility of the weight values.
Output Matrix Panel (Right): Displays the convolution result as a grid in green shades. The output size depends on the kernel size: 30×30 for 3×3 kernels, 26×26 for 7×7 kernels. The output accumulates pixel-by-pixel as the kernel slides, filling from top-left to bottom-right, row by row. Each computed pixel is shown with brightness indicating value magnitude (no text labels - image-like display). The currently active pixel (being computed) has a pulse animation. The visualization uses a dark theme (black background) with bright colors for visibility. For edge detection kernels, the output values are displayed as absolute values (magnitude). For stylistic filters, values are clamped between 0 and 1.
Math Panel: The calculation panel below the grids shows the detailed computation for the current step. It displays all 9 multiplication terms in the format "(input_value × kernel_value) = product", then sums them to show the final output value. This helps you understand exactly how each output pixel is computed from the 9 input pixels and 9 kernel weights. The panel updates in real-time as you step through the animation.

Key Concepts

2D Discrete Matrices: Both the input matrix and kernel are 2D discrete arrays, meaning they are defined at integer grid positions (i, j). The grid visualization clearly shows this 2D nature, with each cell representing a pixel value. The input is a 32×32 grid, the kernel is a 3×3 or 7×7 grid (depending on selection), and the output is a 30×30 or 26×26 grid (depending on kernel size).
Sliding Window Operation: 2D convolution is fundamentally a 2D sliding window operation. The kernel acts as a window that slides across the input matrix. At each position, the window captures a 3×3 portion of the input, and the convolution computes a weighted sum of the 9 values inside the window, where the weights are the kernel coefficients.
Element-wise Multiplication and Sum: At each output position, the kernel is placed over the input, and each of the 9 input values is multiplied by the corresponding kernel weight. All 9 products are then summed to produce the output pixel value. The math panel shows this calculation step-by-step, displaying all 9 terms and the final sum.
Valid Padding: The tutorial uses "Valid" padding (no padding), which means no zeros are added around the input matrix. This results in an output that is smaller than the input. For a 32×32 input and 3×3 kernel, the output is 30×30. For a 32×32 input and 7×7 kernel, the output is 26×26. The kernel must fully overlap with the input at each position, so the output size is reduced by (kernel_size - 1) in each dimension: (32 - 3 + 1) = 30 for 3×3 kernels, (32 - 7 + 1) = 26 for 7×7 kernels.
Pixel-by-Pixel Computation: Each output pixel is computed independently by sliding the kernel to that position, multiplying the 9 overlapping values, and summing. The visualization shows this accumulation process, making it clear that convolution is not a global operation but a local, sliding operation. The output fills from top-left to bottom-right, row by row.
Different Kernel Effects: Different kernels produce different effects:
- Smoothing Kernels (Box Blur): Average neighboring pixels, creating blur effects and reducing noise
- Edge Detection Kernels (Sobel): Highlight transitions and edges, producing high values at boundaries
- Sharpen Kernels: Enhance edges and details, making images appear sharper and more defined
- Identity Kernel: Pass input through unchanged, useful for understanding the basic operation
Color Coding: The visualization uses color coding to represent values:
- Input Matrix: Blue shades - brighter blue = higher pixel value
- Kernel: Red/Orange shades - bright red for positive weights, dark red/purple for negative weights
- Output Matrix: Green shades - brighter green = higher output value, darker/red for negative values
- Highlight Box: Yellow border with semi-transparent fill - shows current kernel position
Applications: 2D convolution is fundamental to many fields:
- Image Processing: Blurring, sharpening, edge detection, feature extraction
- Computer Vision: Object detection, pattern recognition, feature detection
- Deep Learning: Convolutional Neural Networks (CNNs) use 2D convolution as the core operation to detect features in images
- Medical Imaging: Image enhancement, noise reduction, feature detection
Real-Time Visualization: The animation updates in real-time as the kernel slides. The output accumulates pixel-by-pixel, making it easy to see how each position contributes to the final result. The step controls allow detailed analysis of individual computation steps. The math panel updates to show the exact calculation for each step, displaying all 9 multiplication terms and the final sum.