|
|
||||||
|
Softmax turns model scores into probabilities. The raw scores are called logits. A logit can be any real number, but the final softmax outputs are all positive and sum to 1. Sections Mathematical FoundationFor class i, softmax is: Pi = exp(zi / T) / Σj exp(zj / T)
zi is the logit for class i. T is temperature. Lower temperature makes the distribution sharper. Higher temperature makes it more even. Numerically Stable FormThe simulator uses the equivalent stable form: Pi = exp(zi / T − max) / Σj exp(zj / T − max)
Subtracting max does not change the final probabilities. It only prevents very large exponentials. Concrete ExampleAssume logits are [2.00, 1.00, 0.10, -1.00] and T = 1.00. Scaled logits = z / T = [2.00 / 1.00, 1.00 / 1.00, 0.10 / 1.00, -1.00 / 1.00] = [2.00, 1.00, 0.10, -1.00] max = max([2.00, 1.00, 0.10, -1.00]) = 2.00 Shifted logits = scaled - max = [2.00 - 2.00, 1.00 - 2.00, 0.10 - 2.00, -1.00 - 2.00] = [0.00, -1.00, -1.90, -3.00] Exponentials = [exp(0.00), exp(-1.00), exp(-1.90), exp(-3.00)] = [1.00, 0.37, 0.15, 0.05] D = sum exponentials = 1.00 + 0.37 + 0.15 + 0.05 = 1.57 P0 = exp(z0 / T - max) / D = exp(2.00 / 1.00 - 2.00) / 1.57 = 0.64 P1 = exp(z1 / T - max) / D = exp(1.00 / 1.00 - 2.00) / 1.57 = 0.23 P2 = exp(z2 / T - max) / D = exp(0.10 / 1.00 - 2.00) / 1.57 = 0.10 P3 = exp(z3 / T - max) / D = exp(-1.00 / 1.00 - 2.00) / 1.57 = 0.03 So class 0 has the highest probability because its logit is the largest. The other classes still keep nonzero probabilities. SimulationThe interactive simulator is below. Use the controls to explore the concepts described above.
1.00
Logits Exponentials: exp(z / T - max) Probabilities Calculation table
Usage InstructionsUse the preset menu to start from a common softmax situation. Then adjust each logit with its slider or number box. The exponentials, probability bars, table, and formulas update immediately. Select a class to inspect its probability calculation. For example, selecting class 2 highlights P2 and shows symbolic formula = plugged-in formula = result. Use Step Bwd, Step Fwd, or Run to walk through the mechanism: logits, temperature scaling, exponentials, denominator normalization, and final selected probability. What To NoticeA class with the largest logit usually gets the largest probability, but the probability depends on the gap between logits, not only the absolute value. If all logits are shifted by the same amount, the probabilities do not change. Softmax cares about relative differences. Temperature changes confidence. At low temperature, the largest logit dominates. At high temperature, the probabilities move closer to a uniform distribution. ParametersLogits z0...z3: raw model scores from -5 to +5. Temperature T: scaling factor in the denominator of z / T. The simulator allows 0.2 to 5.0. Selected class: class used for the detailed probability formula panel. Limitations
|
||||||