Web Simulator | ShareTechnote

Web Simulation

Support Vector Machine (SVM) Tutorial

This interactive tutorial demonstrates Support Vector Machines (SVM), a powerful supervised learning algorithm for classification. The key insight is that SVM finds the optimal hyperplane that maximizes the margin - the gap between the two classes of data points.

Click on the canvas to add data points, then watch as the SVM learns to separate them. The green line is the decision boundary, and the dashed lines show the margins. Points that lie on or within the margins are called Support Vectors (highlighted with yellow rings) - these are the critical points that define the boundary.

Mathematical Foundation

Support Vector Machines solve a convex optimization problem. The goal is to find a hyperplane that maximizes the margin while correctly classifying (most) points.

The Hyperplane Equation:

w · x + b = 0

where w = (w_x, w_y) is the weight vector (perpendicular to the hyperplane) and b is the bias.

The Classification Rule:

ŷ = sign(w · x + b)

Points with w·x + b > 0 are classified as Class +1, points with w·x + b < 0 are Class -1.

The Margin:

margin = 2 / ||w||

The margin is the perpendicular distance between the two dashed lines. SVM maximizes this margin.

The Objective Function (Soft-Margin SVM):

minimize: ½||w||² + C · Σ max(0, 1 - y_i(w·x_i + b))

The first term (½||w||²) maximizes the margin. The second term (Hinge Loss) penalizes misclassified points and points within the margin. C (or 1/λ) controls the trade-off.

Understanding Hinge Loss

The Hinge Loss is the key innovation that makes SVMs work:

L(y, ŷ) = max(0, 1 - y · ŷ)

When margin ≥ 1: Loss = 0 (point is correctly classified with sufficient margin)
When 0 < margin < 1: Loss > 0 (point is correct but within the margin)
When margin < 0: Loss > 1 (point is misclassified)

This means SVM only "cares about" points that are close to or on the wrong side of the boundary - the Support Vectors!

Support Vectors: Why They Matter

Support Vectors are the data points that "support" or define the position of the hyperplane:

On the Margin: Points exactly at distance 1/||w|| from the boundary
Within the Margin: Points closer than 1/||w|| to the boundary
Misclassified: Points on the wrong side of the boundary

Key Insight: If you remove any non-support-vector point, the optimal hyperplane stays the same! Only support vectors matter for the final model. This is why SVMs are memory-efficient - at inference time, only support vectors need to be stored.

Stochastic Gradient Descent (SGD)

This simulation uses SGD to optimize the SVM. Each step:

Pick a random point from the dataset
Calculate its margin: y_i(w·x_i + b)
If margin < 1 (support vector or misclassified):
Update: w ← w - η(λw - y_ix_i), b ← b + ηy_i
If margin ≥ 1 (correctly classified, outside margin):
Update: w ← w - ηλw (only regularization)

where η is the learning rate and λ is the regularization parameter.

Comparing SVM to Other Classifiers

vs. Perceptron: Perceptron finds any separating line; SVM finds the best one (maximum margin).
vs. Logistic Regression: LR uses all points equally; SVM focuses only on support vectors near the boundary.
vs. Neural Networks: SVMs have strong theoretical guarantees and work well with small datasets; NNs need more data but can learn complex non-linear patterns.

The Kernel Trick (Beyond Linear)

This visualization shows a Linear SVM. For non-linearly separable data, SVMs can use the Kernel Trick to project data into a higher-dimensional space where it becomes linearly separable:

Polynomial Kernel: K(x, y) = (x·y + c)^d
RBF (Gaussian) Kernel: K(x, y) = exp(-γ||x-y||²)
Sigmoid Kernel: K(x, y) = tanh(αx·y + c)

The beauty of kernels is that we never actually compute the high-dimensional coordinates - we only compute dot products, which can be done efficiently using the kernel function.

Applications of SVM

Image Classification: Handwritten digit recognition, face detection
Text Categorization: Spam filtering, sentiment analysis
Bioinformatics: Gene classification, protein structure prediction
Finance: Credit scoring, fraud detection
Medical Diagnosis: Disease classification from patient data

Controls

Preset:

Learning Rate: 0.01 Regularization (λ): 0.01

Show Margins: Highlight Support Vectors: Show Decision Regions:

Red Points: 0

Blue Points: 0

Support Vectors: 0

Epoch: 0

Loss: 0.0000

Accuracy: 0%

w_x (w_{x_new}): 0.0000

w_y (w_{y_new}): 0.0000

bias (b_new): 0.0000

Margin Width: ∞

Left-Click to add Red (Class -1) | Right-Click or Shift+Click to add Blue (Class +1) | Drag to move points | Double-click to remove

STEP DETAILS Step: 0

Click Step Fwd ▶ to begin step-by-step training, or use Start Training for continuous training.

Decision Boundary (w·x + b = 0)

Class +1 (Blue)

Class -1 (Red)

Support Vector

Interactive Controls

Left-Click: Add a Red point (Class -1).
Right-Click (or Shift+Click): Add a Blue point (Class +1).
Drag: Click and drag any point to move it. Watch how the boundary adapts in real-time!
Double-Click: Remove the nearest point.
Start/Stop Training: Toggle the SGD optimization process. Watch the boundary wiggle and settle!
Step Fwd ▶: Execute a single SGD training step with full calculation details shown in the Step Details panel.
◀ Step Back: Rewind to the previous training step. Useful for understanding how each update affects the model.
Reset Model: Randomize the weights while keeping data points.
Clear All: Remove all points and reset the model.
Learning Rate (η): Controls how dramatically the boundary adjusts each step. Higher = faster but potentially unstable.
Regularization (λ): Controls the trade-off between margin width and classification errors. Higher λ = wider margin.
Preset: Load different data configurations to explore various scenarios.

Understanding the Visualization

Green Line: The decision boundary (hyperplane) where w_x·x + w_y·y + b = 0.
Blue Dashed Line: Upper margin where w·x + b = +1 (Class +1 side).
Red Dashed Line: Lower margin where w·x + b = -1 (Class -1 side).
Yellow Rings: Support Vectors - points within or on the margin that determine the boundary position.
White Highlight: The currently selected point being processed in step-by-step training mode.
Colored Regions: Decision regions showing which class each area belongs to (when enabled).
Step Details Panel

When using Step Fwd ▶, the Step Details panel shows the complete calculation for each SGD update:

Selected Point: Which point was randomly chosen for this update (highlighted on canvas).
Prediction Calculation: How w_x·x + w_y·y + b is computed for this point.
Margin Check: margin = label × prediction. Determines if hinge loss applies.
Weight Updates: Step-by-step calculation showing how w_x, w_y, and b are updated.
• If margin < 1: Applies both hinge loss gradient and regularization.
• If margin ≥ 1: Only applies regularization (weight decay).
Hinge Loss: L = max(0, 1 - margin) for this point.
Decision Boundary Equation: The resulting line equation y = (slope)·x + (intercept).
Margin Boundaries: Equations for the blue (+1) and red (-1) margin lines, plus margin width = 1/||w||.

Tips for Using This Simulation

Use Step-by-Step Mode: Click Step Fwd ▶ to see exactly how each training update works. Watch the highlighted point and follow the calculations!
Use Step Back: Made a step you want to review? Use ◀ Step Back to rewind and compare before/after states.
Watch the Equations: After each step, scroll down to see how the decision boundary equation (y = mx + b) changes.
Experiment with Learning Rate: Too high → boundary oscillates wildly. Too low → convergence is slow.
Watch the Support Vectors: Notice how only points near the boundary (yellow rings) affect the final position.
Try Overlapping Classes: See how SVM handles data that can't be perfectly separated.
Add an Outlier: Place a point on the "wrong" side and watch how the boundary adjusts.
Compare Regularization: Low λ → narrow margin, fits training data tightly. High λ → wide margin, more generalizable.
Watch Margin Width: In the Step Details panel, observe how ||w|| and margin width (1/||w||) change as training progresses.