Web Simulator | ShareTechnote

Web Simulation

Support Vector Machine (SVM) Tutorial

This interactive tutorial demonstrates Support Vector Machines (SVM), a powerful supervised learning algorithm for classification. The key insight is that SVM finds the optimal hyperplane that maximizes the margin - the gap between the two classes of data points.

Click on the canvas to add data points, then watch as the SVM learns to separate them. The green line is the decision boundary, and the dashed lines show the margins. Points that lie on or within the margins are called Support Vectors (highlighted with yellow rings) - these are the critical points that define the boundary.

Sections

Simulation
Application: SVM Video Recommendation Lab

Mathematical Foundation

Support Vector Machines solve a convex optimization problem. The goal is to find a hyperplane that maximizes the margin while correctly classifying (most) points.

The Hyperplane Equation:

w · x + b = 0

where w = (w_x, w_y) is the weight vector (perpendicular to the hyperplane) and b is the bias.

The Classification Rule:

ŷ = sign(w · x + b)

Points with w·x + b > 0 are classified as Class +1, points with w·x + b < 0 are Class -1.

The Margin:

margin = 2 / ||w||

The margin is the perpendicular distance between the two dashed lines. SVM maximizes this margin.

The Objective Function (Soft-Margin SVM):

minimize: ½||w||² + C · Σ max(0, 1 − y_i(w·x_i + b))

The first term (½||w||²) maximizes the margin. The second term (Hinge Loss) penalizes misclassified points and points within the margin. C (or 1/λ) controls the trade-off.

Understanding Hinge Loss

The Hinge Loss is the key innovation that makes SVMs work:

L(y, ŷ) = max(0, 1 − y · ŷ)

When margin ≥ 1: Loss = 0 (point is correctly classified with sufficient margin)
When 0 < margin < 1: Loss > 0 (point is correct but within the margin)
When margin < 0: Loss > 1 (point is misclassified)

This means SVM only "cares about" points that are close to or on the wrong side of the boundary - the Support Vectors!

Support Vectors: Why They Matter

Support Vectors are the data points that "support" or define the position of the hyperplane:

On the Margin: Points exactly at distance 1/||w|| from the boundary
Within the Margin: Points closer than 1/||w|| to the boundary
Misclassified: Points on the wrong side of the boundary

Key Insight: If you remove any non-support-vector point, the optimal hyperplane stays the same! Only support vectors matter for the final model. This is why SVMs are memory-efficient - at inference time, only support vectors need to be stored.

Stochastic Gradient Descent (SGD)

This simulation uses SGD to optimize the SVM. Each step:

Pick a random point from the dataset
Calculate its margin: y_i(w·x_i + b)
If margin < 1 (support vector or misclassified):
Update: w ← w - η(λw - y_ix_i), b ← b + ηy_i
If margin ≥ 1 (correctly classified, outside margin):
Update: w ← w - ηλw (only regularization)

where η is the learning rate and λ is the regularization parameter.

Comparing SVM to Other Classifiers

vs. Perceptron: Perceptron finds any separating line; SVM finds the best one (maximum margin).
vs. Logistic Regression: LR uses all points equally; SVM focuses only on support vectors near the boundary.
vs. Neural Networks: SVMs have strong theoretical guarantees and work well with small datasets; NNs need more data but can learn complex non-linear patterns.

The Kernel Trick (Beyond Linear)

This visualization shows a Linear SVM. For non-linearly separable data, SVMs can use the Kernel Trick to project data into a higher-dimensional space where it becomes linearly separable:

Polynomial Kernel: K(x, y) = (x·y + c)^d
RBF (Gaussian) Kernel: K(x, y) = exp(-γ||x-y||²)
Sigmoid Kernel: K(x, y) = tanh(αx·y + c)

The beauty of kernels is that we never actually compute the high-dimensional coordinates - we only compute dot products, which can be done efficiently using the kernel function.

Applications of SVM

Image Classification: Handwritten digit recognition, face detection
Text Categorization: Spam filtering, sentiment analysis
Bioinformatics: Gene classification, protein structure prediction
Finance: Credit scoring, fraud detection
Medical Diagnosis: Disease classification from patient data

Simulation

The interactive simulator is below. Use the controls to explore the concepts described above.

Controls

Preset:

Training speed: 1.00

Learning Rate: 0.01 Regularization (λ): 0.01

Show Margins: Highlight Support Vectors: Show Decision Regions:

Red Points: 0

Blue Points: 0

Support Vectors: 0

Epoch: 0

Loss: 0.0000

Accuracy: 0%

w_x (w_{x_new}): 0.0000

w_y (w_{y_new}): 0.0000

bias (b_new): 0.0000

Margin Width: ∞

Left-Click to add Red (Class -1) | Right-Click or Shift+Click to add Blue (Class +1) | Drag to move points | Double-click to remove

STEP DETAILS Step: 0

Click Step Fwd ▶ to begin step-by-step training, or use Start Training for continuous training.

Decision Boundary (w·x + b = 0)

Class +1 (Blue)

Class -1 (Red)

Support Vector

Application: SVM Video Recommendation Lab

Sci-Fi vs Romance: classify "Like" vs "Dislike" by Action intensity and Dialogue density. Each user has a personal SVM (their own hyperplane). Switch users to see different "taste mountains" and a live recommendation feed.

Why SVM for recommendations? Data efficiency: Deep learning needs thousands of clicks; an SVM can form a meaningful boundary with 5–10 points. Instant adaptability: Like two Sci‑Fi movies and the SVM carves out a "Like" zone immediately (Cold Start solution).

User profile:

Kernel:

Regularization (C): 1.0

RBF γ (Gamma): 0.5

Support Vectors: 0

2D Feature space (Action vs Dialogue)

Videos on map:

3D Decision surface (Z = f(x,y))

Recommendation feed (top by score)

Videos sorted by distance from your hyperplane (positive = recommend).

Like (Sci-Fi)

Dislike (Romance)