|
|
||
|
This interactive tutorial demonstrates Support Vector Machines (SVM), a powerful supervised learning algorithm for classification. The key insight is that SVM finds the optimal hyperplane that maximizes the margin - the gap between the two classes of data points. Click on the canvas to add data points, then watch as the SVM learns to separate them. The green line is the decision boundary, and the dashed lines show the margins. Points that lie on or within the margins are called Support Vectors (highlighted with yellow rings) - these are the critical points that define the boundary. Mathematical FoundationSupport Vector Machines solve a convex optimization problem. The goal is to find a hyperplane that maximizes the margin while correctly classifying (most) points. The Hyperplane Equation: w · x + b = 0 where w = (wx, wy) is the weight vector (perpendicular to the hyperplane) and b is the bias. The Classification Rule: ŷ = sign(w · x + b) Points with w·x + b > 0 are classified as Class +1, points with w·x + b < 0 are Class -1. The Margin: margin = 2 / ||w|| The margin is the perpendicular distance between the two dashed lines. SVM maximizes this margin. The Objective Function (Soft-Margin SVM): minimize: ½||w||² + C · Σ max(0, 1 - yi(w·xi + b)) The first term (½||w||²) maximizes the margin. The second term (Hinge Loss) penalizes misclassified points and points within the margin. C (or 1/λ) controls the trade-off. Understanding Hinge LossThe Hinge Loss is the key innovation that makes SVMs work: L(y, ŷ) = max(0, 1 - y · ŷ)
This means SVM only "cares about" points that are close to or on the wrong side of the boundary - the Support Vectors! Support Vectors: Why They MatterSupport Vectors are the data points that "support" or define the position of the hyperplane:
Key Insight: If you remove any non-support-vector point, the optimal hyperplane stays the same! Only support vectors matter for the final model. This is why SVMs are memory-efficient - at inference time, only support vectors need to be stored. Stochastic Gradient Descent (SGD)This simulation uses SGD to optimize the SVM. Each step:
where η is the learning rate and λ is the regularization parameter. Comparing SVM to Other Classifiers
The Kernel Trick (Beyond Linear)This visualization shows a Linear SVM. For non-linearly separable data, SVMs can use the Kernel Trick to project data into a higher-dimensional space where it becomes linearly separable:
The beauty of kernels is that we never actually compute the high-dimensional coordinates - we only compute dot products, which can be done efficiently using the kernel function. Applications of SVM
Controls
Red Points:
0
Blue Points:
0
Support Vectors:
0
Epoch:
0
Loss:
0.0000
Accuracy:
0%
wx (wx_new):
0.0000
wy (wy_new):
0.0000
bias (bnew):
0.0000
Margin Width:
∞
Left-Click to add Red (Class -1) | Right-Click or Shift+Click to add Blue (Class +1) | Drag to move points | Double-click to remove
STEP DETAILS
Step: 0
Click Step Fwd ▶ to begin step-by-step training, or use Start Training for continuous training.
Application: SVM Video Recommendation LabSci-Fi vs Romance: classify "Like" vs "Dislike" by Action intensity and Dialogue density. Each user has a personal SVM (their own hyperplane). Switch users to see different "taste mountains" and a live recommendation feed.
Why SVM for recommendations? Data efficiency: Deep learning needs thousands of clicks; an SVM can form a meaningful boundary with 5–10 points. Instant adaptability: Like two Sci‑Fi movies and the SVM carves out a "Like" zone immediately (Cold Start solution).
Support Vectors: 0
2D Feature space (Action vs Dialogue)
Videos on map:
3D Decision surface (Z = f(x,y))
Recommendation feed (top by score)
Videos sorted by distance from your hyperplane (positive = recommend). Like (Sci-Fi)
Dislike (Romance)
Support Vector
f(x) = sign(∑ αi yi K(xi, x) + b). Left-click = Like, right-click = Dislike; drag to move; double-click point to remove. Margin lines at f = ±1.
Interactive Controls
Understanding the Visualization
Tips for Using This Simulation
|
||