Web Simulation

 

 

 

1-Variable Gradient Descent

This interactive tutorial helps you build intuition for how step size (learning rate η) affects gradient descent: convergence, oscillation, or divergence. The simulation uses a single variable x and updates it with xx − η·f'(x) at each step.

You can choose from six functions, each illustrating a different challenge: a smooth bowl (ideal convex), local minima (the trap), high-frequency ripples, a flat plateau (vanishing gradient), steep walls (exploding gradient), and a non-differentiable V-shape (endless bouncing). Adjust the learning rate (logarithmic scale 0.001–1.5), optionally enable Momentum or an Auto RL method (AdaGrad, RMSprop, Adam), then use Step Fwd, Step Bwd, or Run to run the descent. The main canvas shows the function curve, the tangent at the current x, the trajectory, and an update arrow (green if loss decreased, red if overshot). Click on the plot to set a new starting x and reset the path. The Theory and Parameters sections below spell out the update rules and all controls.

 

Theory

Plain gradient descent. We minimize f(x) by repeatedly moving opposite to the gradient: xx − η·f'(x). The learning rate η controls step size. Too small → slow convergence; too large → oscillation or divergence.

Momentum (Polyak). A velocity term smooths updates and can escape shallow local minima: v ← μv − η·f'(x), then xx + v. The coefficient μ ∈ [0, 1] damps past velocity. μ = 0 reduces to plain GD; μ close to 1 carries more history and can help on “ripply” landscapes.

Adaptive learning rate (Auto RL). These methods scale the effective step size using past gradient information, so different “directions” (here, just sign and magnitude of f') can have different step sizes. When an adaptive method is selected, its parameters appear as sliders; Momentum is disabled. All controls, including Auto RL and its parameters, apply in real time (even during Run).

  • AdaGrad: Accumulates squared gradients GG + (f')², then uses xx − η·f' / (√G + ε). Large past gradients shrink the step. ε (epsilon) avoids division by zero; typical 1e−8.
  • RMSprop: Exponentially decaying average of squared gradients: v ← βv + (1−β)(f')², then xx − η·f' / (√v + ε). β ∈ (0, 1) (often 0.9) controls the memory. Reduces AdaGrad’s aggressive shrinking over long runs.
  • Adam: Combines momentum-like first moments and RMSprop-like second moments. m ← β₁m + (1−β₁)f', v ← β₂v + (1−β₂)(f')², with bias correction, then xx − η·/(√ + ε). β₁ (often 0.9) and β₂ (often 0.999) control the decay. Good default for many problems.

Convergence & divergence. The run stops when |f'(x)| < 0.001 (converged). If x or f(x) becomes non-finite or |x| > 50, the simulation reports “Diverged” and halts. Reset clears the trajectory and any adaptive/momentum state.

 

Controls

0.1000
-3.00
-4.0
4.0
0.90
x = —
Loss = —
Gradient f'(x) = —
Slope suggests moving … (GD goes opposite to gradient).

 

Parameters

  • Function: Selects the loss landscape. Each option includes a short description below the dropdown. The six functions are:
    • The Bowl (Ideal Convex): f(x) = x². Smooth convergence; steps shrink as the slope decreases.
    • The Trap (Local Minima): f(x) = x⁴ − 2x² + 0.5x. Try starting at x = −2 vs x = 2; one leads to a shallow local minimum.
    • The Ripples (Noise): f(x) = x² + sin(5x). Hard to converge; the agent gets stuck in local bumps.
    • The Plateau (Vanishing Gradient): Flattening edges; slopes near zero. Learning stalls unless η is large.
    • The Cliff (Exploding Gradient): Very steep walls. A standard η causes massive overshooting.
    • The Sharp V (Non-Differentiable): f(x) = |x|. Slope is ±1; the algorithm never settles and bounces around the minimum.
  • Learning rate (η): Step size for each update. Range 0.001–1.5 with a logarithmic slider for finer control. Large η can overshoot or diverge; small η converges slowly.
  • Initial x: Starting position on the x-axis. Use the slider or click on the main canvas to set it. Reset restores this value and clears the trajectory.
  • x min / x max: Horizontal range of the main plot (both −8 to 8). Initial x and click-to-set are clamped to this range.
  • Auto Scale (plot): When ON, the main plot’s y-axis includes the trajectory and current point; when OFF (default), y-range is from the function curve over [x min, x max] only.
  • Momentum: When ON, updates use Polyak momentum: v ← μv − η·f'(x), then xx + v. μ (momentum coefficient, 0–1) is set by the slider; the slider is enabled only when Momentum is ON.
  • Auto RL: Adaptive learning-rate method (default None). Options: AdaGrad, RMSprop, Adam. When one is selected, its parameters appear as sliders and Momentum is disabled. All apply in real time (including during Run). Slider ranges: ε (1e−10–1e−4, log scale); β (RMSprop, 0.8–0.999); β₁, β₂ (Adam, β₁ 0.8–0.999, β₂ 0.99–0.9999). Changing the algorithm clears its internal accumulators; changing only parameter values does not.

Buttons

  • Step Fwd: Performs one gradient-descent update (forward). Stops Run if active.
  • Step Bwd: Undoes the last step, reverting to the previous x and loss. Clears momentum and Auto RL accumulators so the next Step Fwd starts from that state. Disabled at the initial point.
  • Run / Stop: Toggle button. Run starts automatic updates until convergence or you click Stop.
  • Reset: Stops Run, restores the initial condition (position from Initial x slider, trajectory cleared), clears momentum and Auto RL accumulators, and redraws.

Visualization

  • Main canvas: Function curve (blue), trajectory (purple; Trajectory dot = points per step, Trajectory line = dashed line; both checkboxes can be on), tangent line at current x (orange, dashed), update arrow on the x-axis (green = loss decreased, red = overshot), and a ball at the current (x, f(x)).
  • Loss plot: Loss vs iteration. Y-axis auto-scaled.
  • Metrics: Sidebar panel showing current x, loss f(x), and gradient f'(x).
  • Insight box: Explains which way the slope suggests moving and highlights when the last step overshot (loss increased).
  • Click-to-set: Click on the main canvas to set a new starting x and reset the path.