|
|
|||||||||||||||||||||||||||||||||
|
This interactive tutorial demonstrates Logistic Regression, one of the most fundamental classification algorithms in machine learning. Unlike linear regression which predicts continuous values, Logistic Regression predicts probabilities using the sigmoid function, making it perfect for binary classification tasks. Click on the canvas to add data points, then watch as the model learns to separate them. The orange line is the decision boundary where P = 0.5. The background color gradient shows the predicted probability — more red means higher probability of Class 0, more blue means higher probability of Class 1. Mathematical FoundationLogistic Regression is a linear model for classification that predicts probabilities using the sigmoid function. Despite its name, it's used for classification, not regression! The Sigmoid (Logistic) Function: σ(z) = 1 / (1 + e−z) The sigmoid function maps any real number to a value between 0 and 1, making it perfect for probability estimation. When z → +∞, σ(z) → 1. When z → −∞, σ(z) → 0. The Model: P(y=1|x) = σ(w · x + b) The probability of class 1 is the sigmoid of the linear combination of inputs. The decision boundary is where P = 0.5, which occurs when w · x + b = 0. The Log Loss (Binary Cross-Entropy): L = −[y · log(p) + (1−y) · log(1−p)] This loss function heavily penalizes confident wrong predictions. If the true label is 1 and p is close to 0, log(p) → −∞, making the loss very high. The Sigmoid Plot — Seeing the Actual S-CurveThe Sigmoid Plot (bottom-right inset) is the key to understanding Logistic Regression! It shows:
What you'll observe:
Key insight: The 2D decision boundary is always a straight line. The sigmoid doesn't change the boundary shape — it maps the linear output z to a probability. The S-curve visualizes this mapping! Log Loss vs. Hinge Loss (Comparison with SVM)The key difference between Logistic Regression and SVM lies in their loss functions:
Consequence: In Logistic Regression, ALL points contribute to the gradient. In SVM, only Support Vectors (points near the boundary) matter. Note: SVM does NOT use a step function — that's the Perceptron! SVM uses piece-wise linear hinge loss. Gradient DescentThe gradients of log loss are elegantly simple:
∂L/∂w = (p − y) · x where p = σ(w·x + b) is the predicted probability and y is the true label (0 or 1). The update rule with regularization:
w ← w − η[(p − y)x + λw] Why Is the Decision Boundary Still a Straight Line?A common question: "If we use a curved sigmoid function, why is the boundary straight?" The answer:
The sigmoid doesn't curve the boundary — it curves the probability transition from 0 to 1 as you move away from the boundary. Look at the Sigmoid Plot to see this transition! Why Logistic Regression Works
The Log-Odds (Logit) InterpretationThe linear combination z = w·x + b is the log-odds (also called logit): z = log(P / (1−P)) This means:
Logistic Regression vs. SVM vs. Perceptron
Key takeaway: Logistic Regression is the only one that outputs probabilities. The sigmoid curve maps the linear output to a probability — that's its unique strength! Controls
Red (Class 0):
0
Blue (Class 1):
0
Epoch:
0
Log Loss:
0.0000
Accuracy:
0%
wx (wx_new):
0.0000
wy (wy_new):
0.0000
bias (bnew):
0.0000
Left-Click to add Red (Class 0) | Right-Click or Shift+Click to add Blue (Class 1) | Drag to move points | Double-click to remove
STEP DETAILS
Step: 0
Click Step Fwd ▶ to begin step-by-step training, or use Start Training for continuous training.
Interactive Controls
Understanding the Visualization
Visualization Options
Step Details PanelWhen using Step Fwd ▶, the Step Details panel shows the complete calculation for each gradient descent update:
Applications of Logistic Regression
Tips for Using This Simulation
|
|||||||||||||||||||||||||||||||||