Web Simulation 

 

 

 

 

Logistic Classification Tutorial 

This interactive tutorial demonstrates Logistic Regression, one of the most fundamental classification algorithms in machine learning. Unlike linear regression which predicts continuous values, Logistic Regression predicts probabilities using the sigmoid function, making it perfect for binary classification tasks.

Click on the canvas to add data points, then watch as the model learns to separate them. The orange line is the decision boundary where P = 0.5. The background color gradient shows the predicted probability — more red means higher probability of Class 0, more blue means higher probability of Class 1.

Mathematical Foundation

Logistic Regression is a linear model for classification that predicts probabilities using the sigmoid function. Despite its name, it's used for classification, not regression!

The Sigmoid (Logistic) Function:

σ(z) = 1 / (1 + e−z)

The sigmoid function maps any real number to a value between 0 and 1, making it perfect for probability estimation. When z → +∞, σ(z) → 1. When z → −∞, σ(z) → 0.

The Model:

P(y=1|x) = σ(w · x + b)

The probability of class 1 is the sigmoid of the linear combination of inputs. The decision boundary is where P = 0.5, which occurs when w · x + b = 0.

The Log Loss (Binary Cross-Entropy):

L = −[y · log(p) + (1−y) · log(1−p)]

This loss function heavily penalizes confident wrong predictions. If the true label is 1 and p is close to 0, log(p) → −∞, making the loss very high.

The Sigmoid Plot — Seeing the Actual S-Curve

The Sigmoid Plot (bottom-right inset) is the key to understanding Logistic Regression! It shows:

  • X-axis (z): The "logit" value z = w·x + b — the signed distance from the decision boundary.
  • Y-axis (P): The predicted probability from 0 to 1.
  • Green S-Curve: The sigmoid function σ(z) = 1/(1+e−z) — this IS the logistic curve!
  • Data Points: Each point is plotted ON the curve based on its z value.

What you'll observe:

  • Points with z << 0 (far left on curve) → P ≈ 0 (confident Red prediction)
  • Points with z ≈ 0 (center) → P ≈ 0.5 (uncertain, on the boundary)
  • Points with z >> 0 (far right on curve) → P ≈ 1 (confident Blue prediction)
  • As training progresses, watch points slide along the S-curve!

Key insight: The 2D decision boundary is always a straight line. The sigmoid doesn't change the boundary shape — it maps the linear output z to a probability. The S-curve visualizes this mapping!

Log Loss vs. Hinge Loss (Comparison with SVM)

The key difference between Logistic Regression and SVM lies in their loss functions:

  • Log Loss (Logistic): L = −log(p) — Always penalizes, even correct predictions. Wants MORE confidence forever.
  • Hinge Loss (SVM): L = max(0, 1 − margin) — Zero loss once margin > 1. Stops caring after sufficient margin.

Consequence: In Logistic Regression, ALL points contribute to the gradient. In SVM, only Support Vectors (points near the boundary) matter.

Note: SVM does NOT use a step function — that's the Perceptron! SVM uses piece-wise linear hinge loss.

Gradient Descent

The gradients of log loss are elegantly simple:

∂L/∂w = (p − y) · x
∂L/∂b = (p − y)

where p = σ(w·x + b) is the predicted probability and y is the true label (0 or 1).

The update rule with regularization:

w ← w − η[(p − y)x + λw]
b ← b − η(p − y)

Why Is the Decision Boundary Still a Straight Line?

A common question: "If we use a curved sigmoid function, why is the boundary straight?" The answer:

  • The decision boundary is where P = 0.5, which occurs when σ(z) = 0.5
  • This happens when z = 0, i.e., when w·x + b = 0
  • The equation w·x + b = 0 is a linear equation — always a straight line in 2D!

The sigmoid doesn't curve the boundary — it curves the probability transition from 0 to 1 as you move away from the boundary. Look at the Sigmoid Plot to see this transition!

Why Logistic Regression Works

  • Probabilistic Output: Unlike SVM which gives class labels, Logistic Regression gives you probability estimates. You know not just WHAT the prediction is, but HOW CONFIDENT the model is.
  • Convex Optimization: Log loss is convex, guaranteeing a unique global minimum. No local minima traps!
  • Interpretable Coefficients: Each weight wi tells you how much feature xi affects the log-odds of class 1.
  • Smooth Gradients: Unlike Perceptron's discrete jumps, sigmoid provides smooth gradients that guide optimization reliably.

The Log-Odds (Logit) Interpretation

The linear combination z = w·x + b is the log-odds (also called logit):

z = log(P / (1−P))

This means:

  • z = 0 → P = 0.5 (50/50 odds, on the decision boundary)
  • z = 1 → P ≈ 0.73 (odds of about 2.7:1 for class 1)
  • z = 2 → P ≈ 0.88 (odds of about 7.4:1 for class 1)
  • z = −1 → P ≈ 0.27 (odds of about 2.7:1 for class 0)

Logistic Regression vs. SVM vs. Perceptron

Aspect Perceptron Logistic Regression SVM
Activation Step function: sign(z) Sigmoid: σ(z) Identity (just z)
Output Class label {−1, +1} Probability (0 to 1) Class label (+1 or −1)
Loss Function 0/1 misclassification Log Loss (Cross-Entropy) Hinge Loss
Points Used Only misclassified ALL points contribute Only Support Vectors
Goal Find any separator Maximize likelihood Maximize margin
Gradient Behavior Jumps discretely Always smooth, never zero Zero when margin > 1
Best For Simple, separable data Need probability estimates High-dimensional, sparse

Key takeaway: Logistic Regression is the only one that outputs probabilities. The sigmoid curve maps the linear output to a probability — that's its unique strength!

0.5 0.01
Red (Class 0): 0
Blue (Class 1): 0
Epoch: 0
Log Loss: 0.0000
Accuracy: 0%
wx (wx_new): 0.0000
wy (wy_new): 0.0000
bias (bnew): 0.0000
Left-Click to add Red (Class 0) | Right-Click or Shift+Click to add Blue (Class 1) | Drag to move points | Double-click to remove
STEP DETAILS Step: 0
Click Step Fwd ▶ to begin step-by-step training, or use Start Training for continuous training.
Decision Boundary (P = 0.5)
Class 1 (Blue)
Class 0 (Red)
Probability Gradient
Sigmoid S-Curve (inset)

 

Interactive Controls

  • Left-Click: Add a Red point (Class 0).
  • Right-Click (or Shift+Click): Add a Blue point (Class 1).
  • Drag: Click and drag any point to move it. Watch how the boundary adapts!
  • Double-Click: Remove the nearest point.
  • Start/Stop Training: Toggle the gradient descent optimization process.
  • Step Fwd ▶: Execute a single SGD training step with full calculation details shown in the Step Details panel.
  • ◀ Step Back: Rewind to the previous training step.
  • Reset Model: Randomize the weights while keeping data points.
  • Clear All: Remove all points and reset the model.
  • Learning Rate (η): Controls step size in gradient descent. Higher = faster but may overshoot.
  • Regularization (λ): L2 penalty to prevent overfitting. Higher = simpler model.
  • Preset: Load different data configurations to explore the sigmoid behavior:
    • Sigmoid Demo (1D): Points arranged horizontally — best for seeing the classic S-curve probability transition from 0→1.
    • Probability Gradient: Points at varying distances from boundary — shows how confidence changes with distance.
    • Uncertainty Zone: All points clustered near center — demonstrates predictions near P=0.5.
    • Confidence Contrast: Mix of far (confident) and near (uncertain) points.
    • Diagonal Separation: Shows 2D probability gradient across a diagonal boundary.
    • Overlapping Classes: Non-separable data — model must find best compromise.

Understanding the Visualization

  • Orange Line: The decision boundary where P(y=1) = 0.5 (i.e., w·x + b = 0).
  • Background Gradient: Probability map — red regions predict Class 0, blue regions predict Class 1.
  • White Highlight: The currently selected point being processed in step-by-step training mode.
  • Probability Contours: Optional lines showing where P = 0.1, 0.25, 0.75, 0.9.
  • Sigmoid Plot (bottom-right inset): THE ACTUAL S-CURVE! Shows P vs. z (signed distance from boundary). Each data point is plotted on the curve — you can SEE the sigmoid function mapping z values to probabilities!

Visualization Options

  • Show Probability Regions: Toggle the red-blue gradient background showing the probability landscape.
  • Show Probability Contours: Display iso-probability lines (P = 0.1, 0.25, 0.75, 0.9) — these are always parallel to the decision boundary!
  • Show Sigmoid Plot: Toggle the S-curve inset that visualizes how z (logit) maps to probability P.

Step Details Panel

When using Step Fwd ▶, the Step Details panel shows the complete calculation for each gradient descent update:

  • Selected Point: Which point was randomly chosen for this update (highlighted on canvas).
  • Logit & Sigmoid: How z = w·x + b is computed, then transformed to probability via σ(z).
  • Error Calculation: The difference between predicted probability and true label.
  • Log Loss: The cross-entropy loss L = −[y·log(p) + (1−y)·log(1−p)].
  • Gradient Calculation: Partial derivatives ∂L/∂wx, ∂L/∂wy, ∂L/∂b.
  • Weight Updates: Step-by-step showing wnew = w − η × gradient.
  • Decision Boundary: The resulting line equation y = (slope)·x + (intercept).

Applications of Logistic Regression

  • Medical Diagnosis: Predicting disease probability from symptoms
  • Credit Scoring: Probability of loan default
  • Spam Detection: Probability an email is spam
  • Click Prediction: Probability a user clicks an ad
  • Churn Prediction: Probability a customer will leave

Tips for Using This Simulation

  • Use Step-by-Step Mode: Click Step Fwd ▶ to see exactly how each gradient descent update works. Watch the probability calculation!
  • Watch the Probability: Notice how each point gets a probability estimate, shown when highlighted (P=0.XX).
  • Compare to SVM: Unlike SVM where only "Support Vectors" matter, here EVERY point contributes to the gradient.
  • Try High Learning Rate: Set η = 2.0 and watch the boundary oscillate — too aggressive!
  • Try Sigmoid Demo: Use the "Sigmoid Demo (1D)" preset to see the classic S-curve probability transition.
  • Try Uncertainty Zone: See how the model handles points all clustered near P=0.5.
  • Watch the Log Loss: As training progresses, log loss should decrease. If it increases, learning rate may be too high.
  • Enable Probability Contours: Check "Show Probability Contours" to see lines where P = 0.1, 0.25, 0.75, 0.9.
  • Watch the Sigmoid Plot: The inset in the bottom-right shows the actual S-curve! Watch how points move along the curve as training progresses.