Web Simulator | ShareTechnote

Web Simulation

Logistic Classification Tutorial

This interactive tutorial demonstrates Logistic Regression, one of the most fundamental classification algorithms in machine learning. Unlike linear regression which predicts continuous values, Logistic Regression predicts probabilities using the sigmoid function, making it perfect for binary classification tasks.

Click on the canvas to add data points, then watch as the model learns to separate them. The orange line is the decision boundary where P = 0.5. The background color gradient shows the predicted probability — more red means higher probability of Class 0, more blue means higher probability of Class 1.

Sections

Mathematical Foundation
The Sigmoid Plot — Seeing the Actual S-Curve
Log Loss vs. Hinge Loss (Comparison with SVM)
Gradient Descent
Why Is the Decision Boundary Still a Straight Line?
Why Logistic Regression Works
The Log-Odds (Logit) Interpretation
Logistic Regression vs. SVM vs. Perceptron
Simulation
Interactive Controls
Understanding the Visualization
Visualization Options
Step Details Panel
Applications of Logistic Regression
Tips for Using This Simulation
Limitations

Mathematical Foundation

Logistic Regression is a linear model for classification that predicts probabilities using the sigmoid function. Despite its name, it's used for classification, not regression!

The Sigmoid (Logistic) Function:

σ(z) = 1 / (1 + e^−z)

The sigmoid maps any real number to a value between 0 and 1, making it perfect for probability estimation. When z → +∞, σ(z) → 1; when z → −∞, σ(z) → 0.

The Model:

P(y=1|x) = σ(w · x + b)

The probability of class 1 is the sigmoid of the linear combination of inputs. The decision boundary is where P = 0.5, which occurs when w · x + b = 0.

The Log Loss (Binary Cross-Entropy):

L = −[y · log(p) + (1−y) · log(1−p)]

This loss function heavily penalizes confident wrong predictions. If the true label is 1 and p is close to 0, log(p) → −∞, making the loss very high.

The Sigmoid Plot — Seeing the Actual S-Curve

The Sigmoid Plot (bottom-right inset) is the key to understanding Logistic Regression! It shows:

X-axis (z): The "logit" value z = w·x + b — the signed distance from the decision boundary.
Y-axis (P): The predicted probability from 0 to 1.
Green S-Curve: The sigmoid function σ(z) = 1/(1+e^−z) — this IS the logistic curve!
Data Points: Each point is plotted ON the curve based on its z value.

What you'll observe:

Points with z << 0 (far left on curve) → P ≈ 0 (confident Red prediction)
Points with z ≈ 0 (center) → P ≈ 0.5 (uncertain, on the boundary)
Points with z >> 0 (far right on curve) → P ≈ 1 (confident Blue prediction)
As training progresses, watch points slide along the S-curve!

Key insight: The 2D decision boundary is always a straight line. The sigmoid doesn't change the boundary shape — it maps the linear output z to a probability. The S-curve visualizes this mapping!

Log Loss vs. Hinge Loss (Comparison with SVM)

The key difference between Logistic Regression and SVM lies in their loss functions:

Log Loss (Logistic): L = −log(p) — Always penalizes, even correct predictions. Wants MORE confidence forever.
Hinge Loss (SVM): L = max(0, 1 − margin) — Zero loss once margin > 1. Stops caring after sufficient margin.

Consequence: In Logistic Regression, ALL points contribute to the gradient. In SVM, only Support Vectors (points near the boundary) matter.

Note: SVM does NOT use a step function — that's the Perceptron! SVM uses piece-wise linear hinge loss.

Gradient Descent

The gradients of log loss are elegantly simple:

∂L/∂w = (p − y) · x
∂L/∂b = (p − y)

where p = σ(w·x + b) is the predicted probability and y is the true label (0 or 1). The update rule with regularization:

w ← w − η[(p − y)x + λw]
b ← b − η(p − y)

Why Is the Decision Boundary Still a Straight Line?

A common question: "If we use a curved sigmoid function, why is the boundary straight?" The answer:

The decision boundary is where P = 0.5, which occurs when σ(z) = 0.5
This happens when z = 0, i.e., when w·x + b = 0
The equation w·x + b = 0 is a linear equation — always a straight line in 2D!

The sigmoid doesn't curve the boundary — it curves the probability transition from 0 to 1 as you move away from the boundary. Look at the Sigmoid Plot to see this transition!

Why Logistic Regression Works

Probabilistic Output: Unlike SVM which gives class labels, Logistic Regression gives you probability estimates. You know not just WHAT the prediction is, but HOW CONFIDENT the model is.
Convex Optimization: Log loss is convex, guaranteeing a unique global minimum. No local minima traps!
Interpretable Coefficients: Each weight w_i tells you how much feature x_i affects the log-odds of class 1.
Smooth Gradients: Unlike Perceptron's discrete jumps, sigmoid provides smooth gradients that guide optimization reliably.

The Log-Odds (Logit) Interpretation

The linear combination z = w·x + b is the log-odds (also called logit):

z = log(P / (1−P))

This means:

z = 0 → P = 0.5 (50/50 odds, on the decision boundary)
z = 1 → P ≈ 0.73 (odds of about 2.7:1 for class 1)
z = 2 → P ≈ 0.88 (odds of about 7.4:1 for class 1)
z = −1 → P ≈ 0.27 (odds of about 2.7:1 for class 0)

Logistic Regression vs. SVM vs. Perceptron

Aspect	Perceptron	Logistic Regression	SVM
Activation	Step function: sign(z)	Sigmoid: σ(z)	Identity (just z)
Output	Class label {−1, +1}	Probability (0 to 1)	Class label (+1 or −1)
Loss Function	0/1 misclassification	Log Loss (Cross-Entropy)	Hinge Loss
Points Used	Only misclassified	ALL points contribute	Only Support Vectors
Goal	Find any separator	Maximize likelihood	Maximize margin
Gradient Behavior	Jumps discretely	Always smooth, never zero	Zero when margin > 1
Best For	Simple, separable data	Need probability estimates	High-dimensional, sparse

Key takeaway: Logistic Regression is the only one that outputs probabilities. The sigmoid curve maps the linear output to a probability — that's its unique strength!

Simulation

The interactive simulator is below. Use the controls to explore the concepts described above.

Controls

Preset:

Learning Rate (η): 0.5 Regularization (λ): 0.01

Show Probability Regions: Show Probability Contours: Show Sigmoid Plot:

Red (Class 0): 0

Blue (Class 1): 0

Epoch: 0

Log Loss: 0.0000

Accuracy: 0%

w_x (w_{x_new}): 0.0000

w_y (w_{y_new}): 0.0000

bias (b_new): 0.0000

Left-Click to add Red (Class 0) | Right-Click or Shift+Click to add Blue (Class 1) | Drag to move points | Double-click to remove

STEP DETAILS Step: 0

Click Step Fwd ▶ to begin step-by-step training, or use Start Training for continuous training.

Decision Boundary (P = 0.5)

Class 1 (Blue)

Class 0 (Red)

Probability Gradient

Sigmoid S-Curve (inset)

Interactive Controls

Left-Click: Add a Red point (Class 0).
Right-Click (or Shift+Click): Add a Blue point (Class 1).
Drag: Click and drag any point to move it. Watch how the boundary adapts!
Double-Click: Remove the nearest point.
Start/Stop Training: Toggle the gradient descent optimization process.
Step Fwd ▶: Execute a single SGD training step with full calculation details shown in the Step Details panel.
◀ Step Back: Rewind to the previous training step.
Reset Model: Randomize the weights while keeping data points.
Clear All: Remove all points and reset the model.
Learning Rate (η): Controls step size in gradient descent. Higher = faster but may overshoot.
Regularization (λ): L2 penalty to prevent overfitting. Higher = simpler model.
Preset: Load different data configurations to explore the sigmoid behavior:

Sigmoid Demo (1D): Points arranged horizontally — best for seeing the classic S-curve probability transition from 0→1.
Probability Gradient: Points at varying distances from boundary — shows how confidence changes with distance.
Uncertainty Zone: All points clustered near center — demonstrates predictions near P=0.5.
Confidence Contrast: Mix of far (confident) and near (uncertain) points.
Diagonal Separation: Shows 2D probability gradient across a diagonal boundary.
Overlapping Classes: Non-separable data — model must find best compromise.

Understanding the Visualization

Orange Line: The decision boundary where P(y=1) = 0.5 (i.e., w·x + b = 0).
Background Gradient: Probability map — red regions predict Class 0, blue regions predict Class 1.
White Highlight: The currently selected point being processed in step-by-step training mode.
Probability Contours: Optional lines showing where P = 0.1, 0.25, 0.75, 0.9.
Sigmoid Plot (bottom-right inset): THE ACTUAL S-CURVE! Shows P vs. z (signed distance from boundary). Each data point is plotted on the curve — you can SEE the sigmoid function mapping z values to probabilities!

Visualization Options

Show Probability Regions: Toggle the red-blue gradient background showing the probability landscape.
Show Probability Contours: Display iso-probability lines (P = 0.1, 0.25, 0.75, 0.9) — these are always parallel to the decision boundary!
Show Sigmoid Plot: Toggle the S-curve inset that visualizes how z (logit) maps to probability P.

Step Details Panel

When using Step Fwd ▶, the Step Details panel shows the complete calculation for each gradient descent update:

Selected Point: Which point was randomly chosen for this update (highlighted on canvas).
Logit & Sigmoid: How z = w·x + b is computed, then transformed to probability via σ(z).
Error Calculation: The difference between predicted probability and true label.
Log Loss: The cross-entropy loss L = −[y·log(p) + (1−y)·log(1−p)].
Gradient Calculation: Partial derivatives ∂L/∂w_x, ∂L/∂w_y, ∂L/∂b.
Weight Updates: Step-by-step showing w_new = w − η × gradient.
Decision Boundary: The resulting line equation y = (slope)·x + (intercept).

Applications of Logistic Regression

Medical Diagnosis: Predicting disease probability from symptoms
Credit Scoring: Probability of loan default
Spam Detection: Probability an email is spam
Click Prediction: Probability a user clicks an ad
Churn Prediction: Probability a customer will leave

Tips for Using This Simulation

Use Step-by-Step Mode: Click Step Fwd ▶ to see exactly how each gradient descent update works. Watch the probability calculation!
Watch the Probability: Notice how each point gets a probability estimate, shown when highlighted (P=0.XX).
Compare to SVM: Unlike SVM where only "Support Vectors" matter, here EVERY point contributes to the gradient.
Try High Learning Rate: Set η = 2.0 and watch the boundary oscillate — too aggressive!
Try Sigmoid Demo: Use the "Sigmoid Demo (1D)" preset to see the classic S-curve probability transition.
Try Uncertainty Zone: See how the model handles points all clustered near P=0.5.
Watch the Log Loss: As training progresses, log loss should decrease. If it increases, learning rate may be too high.
Enable Probability Contours: Check "Show Probability Contours" to see lines where P = 0.1, 0.25, 0.75, 0.9.
Watch the Sigmoid Plot: The inset in the bottom-right shows the actual S-curve! Watch how points move along the curve as training progresses.

Limitations

Linear decision boundary. The model is linear in the raw features, so the boundary is always a straight line in 2D. Non-linearly separable data needs feature engineering (polynomial/RBF features) or a different model — not shown here.
2D, binary only. Two input features and two classes. Multiclass (softmax/one-vs-rest) and high-dimensional problems are out of scope.
Batch gradient descent on small data. Points are placed by hand and the model updates on the full set; there is no mini-batch SGD, momentum, or adaptive optimizer as used at scale.
Simple L2 regularization. Only a single λ ridge penalty is offered; L1/elastic-net, class weighting for imbalance, and calibration are not modeled.
Probabilities assume a well-specified model. The sigmoid outputs are treated as calibrated probabilities, which holds only when the linear-logit assumption is correct and classes overlap as modeled.
No train/test split. The demo fits and visualizes on the same points, so it illustrates the optimization mechanism rather than generalization, overfitting, or validation.