|
|
||
|
This interactive tutorial visualizes the backpropagation algorithm in a "glass-box" neural network. Watch data and gradients flow through the network as animated packets, see the chain rule calculations in real-time, and explore critical concepts like vanishing gradients and dying ReLU. What You SeeNetwork View: A 2-2-1 neural network (2 inputs, 2 hidden neurons, 1 output) with weighted connections. Edges change thickness based on weight magnitude and color (blue=positive, red=negative). Green packets show gradients flowing backward during backpropagation. Click on any connection to see the chain rule breakdown. Loss Chart: Real-time plot of training loss vs. epoch with x-axis tick marks showing epoch numbers. Watch how different problems (AND gate vs. XOR gate) affect convergence. Lock icons (🔒) appear on the chart when neurons become locked during training. Chain Rule Panel: When you click on a connection, this panel shows the exact gradient calculation: ∂L/∂w = δ × a, where δ is the error term at the target neuron and a is the activation at the source neuron. The selection persists so you can track a specific weight during training. Key ConceptsForward Pass: Inputs propagate through the network. Each neuron computes z = Σ(w·a) + b, then applies the activation function a = σ(z). Backward Pass: Gradients flow backward from the output. The output delta is computed from the loss derivative, then propagated to hidden layers using the chain rule: δ_hidden = (Σ δ_next × w) × σ'(z). Understanding the Delta CalculationImportant: The Slope (σ') is NOT a function of the error. The Slope is a function of the Neuron's Current Value (z or a). It doesn't care if the answer is "Right" or "Wrong" (the Error). It only cares if the neuron is "Active" or "Saturated." 1. The Raw Error (a-y):
2. The Slope (σ'):
Why this distinction matters (The "Aha!" Moment): In the "Vanishing Gradient" demo, the Error might be huge (e.g., prediction 0.99 vs target 0.0), but if the Slope is zero, the gradient dies.
So, the Slope is purely a physical property of the neuron's current state, independent of the correct answer. Technical Note: For Sigmoid, there is a special property where the derivative can be calculated using only the output (a) instead of the complex sum (z): σ'(z) = a × (1 - a). This means:
If a is 0.9 (very high), the Slope is 0.09 (very low). This means even though the signal is strong (a), the neuron is "maxed out" and won't change much if you push it further. This is the saturation point. Output Layer vs. Hidden Layer Delta CalculationWhy the Delta Calculation row behaves differently: The reason the "Delta Calculation" row shows different information (or disappears) for hidden layers is that the math changes completely once you move away from the output layer. 1. Output Layer (Simple): The error comes directly from the "Outside World" (the Loss Function). Formula: δ_output = (a - y) × σ'(a) This fits perfectly in the 3-box layout: Raw Error × Slope = Delta 2. Hidden Layer (Complex): The error does not come from a target. It comes from backpropagating the error from the layer ahead. Formula: δ_hidden = (Σ w × δ_next) × σ'(a) Because the hidden delta depends on a Sum of Weighted Errors from the next layer (not just a simple subtraction like a-y), it requires a different visualization. When you hover over a hidden layer connection, you'll see "Backprop Sum (Σwδ)" instead of "Raw Error (a-y)", showing how hidden neurons learn by listening to weighted complaints from the layers ahead of them. Lock Icons: When neurons become "locked" (saturated with very small activation derivatives), a 🔒 icon appears on the neuron in the network view and on the loss chart at the epoch where the lock occurred. You can adjust the sensitivity using the "Lock Threshold" slider. Cross-Entropy Magic: When using cross-entropy loss with sigmoid, the derivative simplifies to (a - y), canceling out the sigmoid derivative. This prevents vanishing gradients even when neurons are saturated! Network ViewChain RuleHover over a connection to see the chain rule.
Loss CalculationAverage loss calculation across all training samples.
Loss vs. EpochActivations (a)Deltas (δ)Weights (w)
Parameters
Scenarios
Interactions
Educational ValueThis simulation makes abstract concepts concrete:
|
||