Web Simulation 

 

 

 

 

Softmax

Softmax turns model scores into probabilities. The raw scores are called logits. A logit can be any real number, but the final softmax outputs are all positive and sum to 1.

Mathematical Foundation

For class i, softmax is:

Pi = exp(zi / T) / Σj exp(zj / T)

zi is the logit for class i. T is temperature. Lower temperature makes the distribution sharper. Higher temperature makes it more even.

Numerically Stable Form

The simulator uses the equivalent stable form:

Pi = exp(zi / T − max) / Σj exp(zj / T − max)

Subtracting max does not change the final probabilities. It only prevents very large exponentials.

Concrete Example

Assume logits are [2.00, 1.00, 0.10, -1.00] and T = 1.00.

Scaled logits = z / T = [2.00 / 1.00, 1.00 / 1.00, 0.10 / 1.00, -1.00 / 1.00] = [2.00, 1.00, 0.10, -1.00]

max = max([2.00, 1.00, 0.10, -1.00]) = 2.00

Shifted logits = scaled - max = [2.00 - 2.00, 1.00 - 2.00, 0.10 - 2.00, -1.00 - 2.00] = [0.00, -1.00, -1.90, -3.00]

Exponentials = [exp(0.00), exp(-1.00), exp(-1.90), exp(-3.00)] = [1.00, 0.37, 0.15, 0.05]

D = sum exponentials = 1.00 + 0.37 + 0.15 + 0.05 = 1.57

P0 = exp(z0 / T - max) / D = exp(2.00 / 1.00 - 2.00) / 1.57 = 0.64

P1 = exp(z1 / T - max) / D = exp(1.00 / 1.00 - 2.00) / 1.57 = 0.23

P2 = exp(z2 / T - max) / D = exp(0.10 / 1.00 - 2.00) / 1.57 = 0.10

P3 = exp(z3 / T - max) / D = exp(-1.00 / 1.00 - 2.00) / 1.57 = 0.03

So class 0 has the highest probability because its logit is the largest. The other classes still keep nonzero probabilities.

Simulation

The interactive simulator is below. Use the controls to explore the concepts described above.

1.00

Logits

Exponentials: exp(z / T - max)

Probabilities

Calculation table

Class

z

z / T

exp(z / T - max)

P

 

Usage Instructions

Use the preset menu to start from a common softmax situation. Then adjust each logit with its slider or number box. The exponentials, probability bars, table, and formulas update immediately.

Select a class to inspect its probability calculation. For example, selecting class 2 highlights P2 and shows symbolic formula = plugged-in formula = result.

Use Step Bwd, Step Fwd, or Run to walk through the mechanism: logits, temperature scaling, exponentials, denominator normalization, and final selected probability.

What To Notice

A class with the largest logit usually gets the largest probability, but the probability depends on the gap between logits, not only the absolute value.

If all logits are shifted by the same amount, the probabilities do not change. Softmax cares about relative differences.

Temperature changes confidence. At low temperature, the largest logit dominates. At high temperature, the probabilities move closer to a uniform distribution.

Parameters

Logits z0...z3: raw model scores from -5 to +5.

Temperature T: scaling factor in the denominator of z / T. The simulator allows 0.2 to 5.0.

Selected class: class used for the detailed probability formula panel.

Limitations

  • Fixed 4-class example. The lab shows softmax over exactly four logits so the arithmetic stays readable. Real classifiers softmax over hundreds or thousands of classes, where the same formula holds but no single class dominates as cleanly.
  • Logits are set by hand. You type or drag the logits directly; they do not come from a trained network, so the demo shows the transform, not how a model learns to produce good scores.
  • Temperature only. The single tunable beyond the logits is temperature T. Related variants — log-softmax, label smoothing, top-k/nucleus sampling, and the softmax+cross-entropy gradient used in training — are not shown.
  • Displayed values are rounded. Probabilities and exponentials are shown to two decimals for clarity, so the bars may appear to sum to slightly more or less than 1.00.
  • No numerical edge cases explored. The stable (subtract-max) form is used internally, but overflow/underflow and very large |z| / very small T behaviour are not demonstrated as failure modes.
  • Teaching tool. Built to make the logits→probabilities mapping and the role of temperature intuitive, not to benchmark a classifier.