Web Simulator | ShareTechnote

Web Simulation

Long Short-Term Memory (LSTM) Network Tutorial

This interactive tutorial demonstrates the Long Short-Term Memory (LSTM) network, an advanced recurrent neural network architecture designed to overcome the limitations of basic RNNs. LSTMs maintain both short-term memory (hidden state) and long-term memory (cell state) through specialized gating mechanisms, allowing them to selectively remember or forget information over long sequences, making them ideal for tasks involving long-term temporal dependencies, such as language modeling, machine translation, speech recognition, and time series prediction. The tutorial visualizes a simplified LSTM architecture with a single input, a single LSTM cell with forget gate, input gate, and output gate, and a single output, making it easy to understand how LSTMs work at a fundamental level.

The visualization displays two main components: (1) Network Diagram (top) - shows the LSTM cell with gates (Forget Gate, Input Gate, Output Gate), Cell State, and Hidden State, with connections and weight values, where the gates and states change color intensity/opacity based on their activation values to visualize the "memory" mechanisms, (2) Time Series Graph (bottom) - shows multiple lines (Input, Cell State, Hidden State, Output, and Gate activations) plotted against time steps, demonstrating how the cell state maintains long-term memory, how the hidden state provides short-term output, and how the gates control information flow. The graphs are rendered using HTML5 Canvas for real-time visualization with a dark theme (black background) and bright colors for optimal visibility. Real-time statistics display the current values of the cell state C(t), hidden state h(t), gate activations, output y(t), and the mathematical calculation for the current step.

The simulator implements the standard LSTM equations: Forget Gate: f(t) = σ(W_f · [h(t-1), x(t)] + b_f) (decides what to forget from cell state), Input Gate: i(t) = σ(W_i · [h(t-1), x(t)] + b_i) (decides what new information to store), Cell State Candidate: C̃(t) = tanh(W_C · [h(t-1), x(t)] + b_C) (candidate values for cell state), Cell State: C(t) = f(t) × C(t-1) + i(t) × C̃(t) (long-term memory update), Output Gate: o(t) = σ(W_o · [h(t-1), x(t)] + b_o) (decides what parts of cell state to output), Hidden State: h(t) = o(t) × tanh(C(t)) (short-term memory/output), Output: y(t) = W_out × h(t) (where W_out is the output weight). You can define an input sequence (e.g., "0, 1, 0, 0, 0") and adjust the gate weights and biases using sliders. Control buttons allow you to Step Forward (advance one time step), Play/Auto (run the sequence automatically), and Reset (return to the initial state). An educational math panel displays the exact calculation for the current step, showing all gate computations and state updates, making the mathematical process transparent.

Glass-box design: this simulation demonstrates LSTM memory in a "glass box" architecture where every gate, weight, activation, and calculation is visible. The simplified LSTM (1 input, 1 LSTM cell, 1 output) is small enough to visualize every gate mechanism yet rich enough to show the fundamental concepts. The key insight is the gating mechanism: the forget gate selectively removes information from the cell state, the input gate selectively adds new information, and the output gate controls what is exposed as the hidden state — this is how LSTMs overcome the vanishing-gradient problem of basic RNNs and maintain long-term dependencies.

Sections

Mathematical Model
Simulation
Usage Example
Parameters
Controls and Visualizations
Key Concepts
Limitations

Mathematical Model

The Long Short-Term Memory (LSTM) network is an advanced recurrent neural network architecture designed to overcome the vanishing gradient problem of basic RNNs. LSTMs maintain both a cell state (long-term memory) and a hidden state (short-term memory/output) through specialized gating mechanisms. The network processes sequences one element at a time, updating its cell state and hidden state at each time step based on the current input, previous hidden state, and previous cell state.

LSTM Equations

Forget Gate: f(t) = σ(W_f · [h(t−1), x(t)] + b_f)  (what to forget)
Input Gate: i(t) = σ(W_i · [h(t−1), x(t)] + b_i)  (what to remember)
Cell Candidate: C̃(t) = tanh(W_C · [h(t−1), x(t)] + b_C)  (new candidate values)
Cell State: C(t) = f(t) × C(t−1) + i(t) × C̃(t)  (long-term memory)
Output Gate: o(t) = σ(W_o · [h(t−1), x(t)] + b_o)  (what to output)
Hidden State: h(t) = o(t) × tanh(C(t))  (short-term memory/output)
Output: y(t) = W_out × h(t)  (linear output)
Initial States: h(0) = 0, C(0) = 0  (zero initialization)

where:

C(t): Cell state at time t (the "long-term memory" of the network) - maintains information across many time steps
h(t): Hidden state at time t (the "short-term memory/output" of the network) - the exposed output of the LSTM cell
f(t): Forget gate activation at time t (sigmoid output in [0, 1]) - controls what information to forget from cell state
i(t): Input gate activation at time t (sigmoid output in [0, 1]) - controls what new information to add to cell state
o(t): Output gate activation at time t (sigmoid output in [0, 1]) - controls what parts of cell state to expose as hidden state
C̃(t): Cell state candidate at time t (tanh output in [-1, 1]) - new candidate values to potentially add to cell state
x(t): Input at time t (from the input sequence) - an input value (scalar, not a neuron), typically 0 or 1
y(t): Output at time t (the network's prediction/response) - a neuron that computes y(t) = W_out × h(t)
W_f, W_i, W_C, W_o: Weight matrices for forget gate, input gate, cell candidate, and output gate respectively
b_f, b_i, b_C, b_o: Bias terms for forget gate, input gate, cell candidate, and output gate respectively
W_out: Output weight (scales the hidden state to produce output)
σ: Sigmoid activation function (squashes values to [0, 1]) - used in all gates
tanh: Hyperbolic tangent activation function (squashes values to [-1, 1]) - used in cell candidate and hidden state
t: Time step (integer: 0, 1, 2, ...)

Understanding the Terms:

Cell State (C(t)): The cell state is the "long-term memory" of the LSTM. It maintains information across many time steps with minimal modification, allowing the network to remember information from the distant past. The cell state is updated through a combination of forgetting (via the forget gate) and adding new information (via the input gate). Unlike the hidden state, the cell state flows through the network with only linear transformations (multiplications and additions), which helps prevent the vanishing gradient problem. The cell state can maintain information for hundreds of time steps, making LSTMs ideal for long sequences.

Hidden State (h(t)): The hidden state is the "short-term memory/output" of the LSTM. It is derived from the cell state through the output gate and tanh activation. The hidden state serves as both the output of the LSTM cell and as input to the next time step (along with the current input). Unlike basic RNNs where the hidden state is the only memory, in LSTMs the hidden state is a filtered view of the cell state, controlled by the output gate. The hidden state is used for predictions and is passed to the next time step along with the cell state.

Forget Gate (f(t)): The forget gate decides what information to discard from the cell state. It takes the previous hidden state h(t-1) and current input x(t) as input, applies a sigmoid activation, and outputs values between 0 and 1. A value of 1 means "keep this information completely", while 0 means "forget this information completely". The forget gate allows the LSTM to selectively remove irrelevant information from the cell state, making room for new information.

Input Gate (i(t)): The input gate decides what new information to store in the cell state. It has two parts: (1) the input gate activation i(t) (sigmoid) that decides which values to update, and (2) the cell candidate C̃(t) (tanh) that creates new candidate values. The input gate works together with the forget gate: the forget gate removes old information, and the input gate adds new information. This selective updating allows the LSTM to maintain long-term memory while incorporating new information.

Output Gate (o(t)): The output gate decides what parts of the cell state to expose as the hidden state. It takes the previous hidden state h(t-1) and current input x(t) as input, applies a sigmoid activation, and outputs values between 0 and 1. The output gate filters the cell state (after tanh activation) to produce the hidden state. This allows the LSTM to control what information is used for predictions and what is passed to the next time step.

Output (y(t)): The output is a linear transformation of the hidden state: y(t) = W_out × h(t). The output weight W_out scales the hidden state to produce the final output. Since the hidden state is derived from the cell state (which contains long-term memory), the output reflects both the current input and information from past inputs, potentially from many time steps ago.

Input (x(t)): The input x(t) is an input value (a scalar, typically 0 or 1), not a neuron. It represents external data fed into the network at each time step. The input is used by all gates (forget, input, output) to make decisions about what to remember, forget, and output.

Memory and Temporal Dependencies:

The gating mechanism creates sophisticated memory by allowing the cell state to selectively maintain information over long sequences. This enables LSTMs to process sequences with long-term temporal dependencies - patterns that depend on elements from many time steps ago. For example, in language modeling, the network can learn that after seeing "The cat sat on the...", the next word is likely "mat" because the cell state maintains the context from previous words. Unlike basic RNNs where memory decays exponentially, LSTMs can maintain information for hundreds of time steps through the cell state, which is protected from decay by the gating mechanism. The forget gate controls what to remove, the input gate controls what to add, and the output gate controls what to expose - this selective memory management is the key to LSTM's success.

Network Diagram Visualization: The network diagram shows the LSTM cell with gates (Forget Gate, Input Gate, Output Gate), Cell State, Hidden State, and connections with weight labels. The gates are visualized as special nodes that control information flow, with their activations (sigmoid outputs) shown as values between 0 and 1. The Cell State flows horizontally through the network, maintaining long-term memory. The Hidden State is derived from the Cell State through the Output Gate. The gates change color intensity/opacity based on their activation values: brighter colors indicate higher activation (gate is more "open"), dimmer colors indicate lower activation (gate is more "closed"). This visual feedback shows how the gating mechanism controls information flow over time, with gates "opening" and "closing" to manage memory.

Simulation

The interactive simulator is below. Use the controls to explore the concepts described above.

Usage Example

Follow these steps to explore how the LSTM maintains memory and processes sequences:

Initial State: When you first load the simulation, you'll see: (1) Network Diagram (top) - shows the LSTM cell with gates (Forget Gate, Input Gate, Output Gate), Cell State, and Hidden State with default weights, (2) Time Series Graph (bottom) - empty, ready to display the sequence, (3) Math Panel - displays the calculation for the current step. The input sequence field is empty. The cell state C(0) and hidden state h(0) are initialized to 0. Notice the gates controlling information flow - this is what creates the sophisticated memory mechanism.
Define Input Sequence: Enter an input sequence in the "Input Sequence" field (e.g., "0, 1, 0, 0, 0" or "1, 1, 0, 1"). The sequence is comma-separated values (0 or 1). This sequence will be processed step by step. Try a simple pattern first: "1, 0, 0, 0, 0" (a single pulse) to observe how the cell state maintains memory and how gates control information flow.
Observe Gate Operations: Use the "Step Forward" button to advance one time step at a time. Watch: (1) The Input shows the current input value (0 or 1), (2) The Forget Gate activation f(t) determines what to forget from cell state, (3) The Input Gate activation i(t) determines what new information to add, (4) The Cell State C(t) maintains long-term memory, (5) The Output Gate activation o(t) determines what to expose as hidden state, (6) The Hidden State h(t) is the filtered output, (7) The Math Panel displays the exact calculation for all gates and states, (8) The Time Series Graph plots the Input, Cell State, Hidden State, Gate activations, and Output values. Try the sequence "1, 0, 0, 0, 0" with default weights - notice how the cell state maintains information while the gates control what is remembered and forgotten.
Adjust Gate Weights: Experiment with the gate weight sliders to control memory behavior. Try:
- Forget Gate weights - High values: forget less (strong memory retention), Low values: forget more (weak memory retention)
- Input Gate weights - High values: add more new information, Low values: add less new information
- Output Gate weights - High values: expose more of cell state as hidden state, Low values: expose less
Reset and replay the sequence "1, 0, 0, 0, 0" with different gate weights. Observe how the gates control information flow and how the cell state maintains long-term memory differently than basic RNNs.
Use Play/Auto Mode: Click "Play" to automatically advance through the sequence. The simulation will process each input step automatically with a small delay between steps, making it easy to observe how the gates operate and how the cell state and hidden state evolve. Click "Pause" to stop and examine the current state. Use Play to see the dynamic behavior: watch the gates "open" and "close", the cell state maintain long-term memory, and the hidden state provide filtered output.
Observe Cell State vs Hidden State: Notice the difference between the Cell State and Hidden State lines in the Time Series Graph. The Cell State maintains long-term memory and can persist for many time steps, while the Hidden State is a filtered view controlled by the Output Gate. The Cell State is the "long-term memory" while the Hidden State is the "working memory/output". This separation is what allows LSTMs to maintain long-term dependencies.
Understand Gate Interactions: Observe how the three gates work together: (1) The Forget Gate removes irrelevant information from the cell state, (2) The Input Gate adds new relevant information to the cell state, (3) The Output Gate controls what information is exposed as the hidden state. These gates operate simultaneously at each time step, allowing the LSTM to selectively manage memory. Watch the gate activation values in the Time Series Graph to see how they control information flow.
Understand the Math Panel: The Math Panel displays the exact calculation for the current step, showing: (1) All gate equations (forget, input, output) with actual values, (2) The cell state candidate calculation, (3) The cell state update (combining forget and input operations), (4) The hidden state calculation (output gate filtering the cell state), (5) The intermediate calculations for each gate. Use Step Forward mode to observe how all gates operate simultaneously and how they control the cell state and hidden state updates.
Try Complex Sequences: Experiment with different input sequences to observe various behaviors:
- "1, 1, 1, 0, 0" - Multiple consecutive inputs: cell state accumulates information, gates control what to remember
- "1, 0, 1, 0, 1" - Alternating pattern: observe how gates manage information flow with changing inputs
- "0, 0, 0, 1, 0" - Late pulse: observe how cell state can maintain information even after many zeros, demonstrating long-term memory
Each sequence demonstrates different aspects of LSTM memory management and gate operations.
Reset and Explore: Click "Reset" to clear the simulation and return to the initial state (C(0) = 0, h(0) = 0). Try different combinations of gate weights and sequences to fully understand how LSTMs maintain memory and process sequences. The key insight is the gating mechanism: the gates create a sophisticated memory system that can selectively remember, forget, and output information over long sequences.

What to watch: the gating mechanism creates sophisticated memory. Unlike basic RNNs where memory decays exponentially, LSTMs use gates to selectively manage information: (1) the Forget Gate controls what to remove from cell state, (2) the Input Gate controls what to add, (3) the Output Gate controls what to expose as hidden state, (4) the Cell State maintains long-term memory across many time steps. Try a simple sequence like "1, 0, 0, 0, 0" and observe how the cell state retains information while the gate activation lines show how that flow is controlled. The Math Panel shows the exact calculations at each step.

Parameters

Followings are short descriptions on each parameter

Input Sequence: A comma-separated sequence of input values (0 or 1) that defines the sequence to be processed by the LSTM. For example, "0, 1, 0, 0, 0" or "1, 1, 0, 1". The sequence is processed step by step, with each value representing the input x(t) at time step t. You can define any sequence length. Default: empty. The sequence is entered in a text field in the control panel. The LSTM processes the sequence from left to right, maintaining its cell state (long-term memory) and hidden state (short-term memory/output) across time steps.
W_f (Forget Gate Weight): The weight for the forget gate that determines what information to remove from the cell state. Range: -2.0 to 2.0. Default: 0.5. Higher values: forget less (strong memory retention), Lower values: forget more (weak memory retention). The forget gate controls what to discard from the cell state, making room for new information. Adjust using a slider in the control panel.
W_i (Input Gate Weight): The weight for the input gate that determines what new information to add to the cell state. Range: -2.0 to 2.0. Default: 0.5. Higher values: add more new information, Lower values: add less new information. The input gate works together with the cell candidate to add new information to the cell state. Adjust using a slider in the control panel.
W_C (Cell Candidate Weight): The weight for the cell candidate that creates new candidate values for the cell state. Range: -2.0 to 2.0. Default: 0.5. The cell candidate is combined with the input gate to update the cell state. Adjust using a slider in the control panel.
W_o (Output Gate Weight): The weight for the output gate that determines what information to expose as hidden state. Range: -2.0 to 2.0. Default: 0.5. Higher values: expose more of cell state as hidden state, Lower values: expose less. The output gate filters the cell state to produce the hidden state. Adjust using a slider in the control panel.
W_out (Output Weight): The weight that scales the hidden state h(t) to produce the output y(t). Range: -2.0 to 2.0. Default: 1.0. The output is y(t) = W_out × h(t), so W_out controls the magnitude of the output signal. Adjust using a slider in the control panel.
b_f (Forget Gate Bias): A constant offset for the forget gate. Range: -1.0 to 1.0. Default: 0.0. The bias term adds a constant value to the forget gate calculation. Adjust using a slider in the control panel.
b_i (Input Gate Bias): A constant offset for the input gate. Range: -1.0 to 1.0. Default: 0.0. The bias term adds a constant value to the input gate calculation. Adjust using a slider in the control panel.
b_C (Cell Candidate Bias): A constant offset for the cell candidate. Range: -1.0 to 1.0. Default: 0.0. The bias term adds a constant value to the cell candidate calculation. Adjust using a slider in the control panel.
b_o (Output Gate Bias): A constant offset for the output gate. Range: -1.0 to 1.0. Default: 0.0. The bias term adds a constant value to the output gate calculation. Adjust using a slider in the control panel.
Cell State C(t): The long-term memory of the LSTM, updated at each time step according to C(t) = f(t) × C(t-1) + i(t) × C̃(t). Range: unbounded (though typically stays within reasonable bounds). Initial value: C(0) = 0. The cell state maintains information across many time steps and is the "highway" for long-term memory. It is visualized in the network diagram (Cell State Highway - blue horizontal line) and in the Time Series Graph (Cell State line - blue solid). The cell state is the key to LSTM's long-term memory capability.
Hidden State h(t): The short-term memory/output of the LSTM, calculated as h(t) = o(t) × tanh(C(t)). Range: [-1, 1] (due to tanh activation). Initial value: h(0) = 0. The hidden state is the filtered output of the cell state, controlled by the output gate. It is visualized in the network diagram and in the Time Series Graph (Hidden State line - purple/brown dashed). The hidden state provides the actual output of the LSTM cell.
Output y(t): The network's output at time step t, calculated as y(t) = W_out × h(t). Range: depends on W_out and h(t). The output is a linear transformation of the hidden state. Visualized in the Time Series Graph (Output line - orange dotted).
Network Architecture: The LSTM architecture consists of: (1) Input x(t) - the input value (a scalar value 0 or 1) from the sequence, (2) LSTM Cell - a cell containing a cell state C(t) (long-term memory), a hidden state h(t) (short-term memory/output), and four gates (Forget Gate f(t), Input Gate i(t), Cell Candidate C̃(t), Output Gate o(t)), (3) Output Neuron y(t) - a neuron that produces the output y(t) = W_out × h(t) from the hidden state. The network diagram visualizes the internal architecture of the LSTM cell: Cell State Highway (blue horizontal line), gates (yellow circles with σ or tanh), operations (white/green circles with × or +), and connections showing information flow. The flattened network view shows multiple time steps with dual-state connections (Cell State and Hidden State).

Controls and Visualizations

Followings are short descriptions on each control

Input Sequence Field: A text input field where you enter the input sequence as comma-separated values (0 or 1). For example, "0, 1, 0, 0, 0" or "1, 1, 0, 1". The sequence defines the inputs x(t) that will be processed step by step. Located in the control panel. The sequence can be any length. Default: empty. When you enter a sequence and click Step Forward or Play, the LSTM processes each value in order, maintaining its cell state (long-term memory) and hidden state (short-term memory/output) across steps.
W_in (Input Weight) Slider: Controls the input weight W_in (range: typically -2.0 to 2.0, default: 0.5). Located in the control panel with label and value display. Higher values make the network more responsive to the current input, lower values make it less responsive. The slider updates in real-time, immediately affecting the hidden state calculation when you step through the sequence. Adjust to control how strongly inputs affect the hidden state.
W_hidden (Memory Weight) Slider: Controls the recurrent/memory weight W_hidden (range: typically 0.0 to 1.0, default: 0.8). Located in the control panel with label and value display. This is the most important parameter - it controls memory retention. Higher values (e.g., 0.9) create strong memory (hidden state persists for many steps), lower values (e.g., 0.1) create weak memory (hidden state fades quickly). The slider updates in real-time. Adjust to observe how memory decay changes - try resetting and replaying a sequence with different W_hidden values to see the effect on memory retention.
W_out (Output Weight) Slider: Controls the output weight W_out (range: typically -2.0 to 2.0, default: 1.0). Located in the control panel with label and value display. The output is y(t) = W_out × h(t), so W_out scales the hidden state to produce the output. The slider updates in real-time, immediately affecting the output values. Adjust to scale the output signal.
b (Bias) Slider: Controls the bias term b (range: typically -1.0 to 1.0, default: 0.0). Located in the control panel with label and value display. The bias adds a constant offset to the hidden state update. The slider updates in real-time. Adjust to shift the operating point of the hidden state.
Step Forward Button: Advances the simulation one time step forward, processing the next input in the sequence. Located in the control panel. When clicked, the RNN: (1) Reads the next input x(t) from the sequence, (2) Updates the hidden state h(t) using the RNN equation, (3) Calculates the output y(t) = W_out × h(t), (4) Updates the network diagram (Hidden neuron color intensity, connection values), (5) Plots the new point on the Time Series Graph, (6) Updates the Math Panel with the exact calculation for this step. Use Step Forward to observe the simulation step by step, carefully watching how the hidden state evolves and how the Math Panel shows the calculations.
Play/Auto Button: Automatically advances through the sequence with a small delay between steps. Located in the control panel. When clicked, the simulation processes each input in the sequence automatically, making it easy to observe the overall behavior. Click "Pause" to stop. The Play mode helps visualize the dynamic behavior: watch the Hidden neuron "glow" when inputs are present, then "fade" as memory decays. Use Play to see how memory evolves over time without manually clicking Step Forward.
Pause Button: Pauses the automatic playback. Located in the control panel. When clicked during Play mode, the simulation stops at the current step, allowing you to examine the current state. Click Play again to resume.
Reset Button: Resets the simulation to the initial state. Located in the control panel. When clicked: (1) The cell state resets to C(0) = 0, (2) The hidden state resets to h(0) = 0, (3) The time step resets to t = 0, (4) The Time Series Graph is cleared, (5) The network diagram shows the initial state. The input sequence and weights are preserved. Use Reset to start over with the same sequence and weights, or to clear the graph and observe the sequence from the beginning.
Network Diagram Canvas: Canvas displaying the LSTM cell architecture visualization showing the internal structure: (1) Cell State Highway (blue horizontal line at top) - represents the cell state C(t) flowing through time, (2) Gates (yellow circles with σ or tanh) - Forget Gate, Input Gate, Cell Candidate, Output Gate arranged vertically, (3) Operations (white/green circles with × or +) - multiplication and addition operations directly on the highway, (4) Connections - orthogonal routing (only horizontal and vertical lines) showing information flow, (5) Inputs (green circles) - x(t) and h(t-1) entering from bottom, (6) Outputs (orange/purple circles) - h(t) and y(t) exiting from right. The diagram uses clean orthogonal routing for clarity. All gate activations and state values are displayed.
Flattened Network Canvas: Canvas displaying the unrolled network view showing multiple time steps with: (1) LSTM Cells (dark rectangles) - one for each time step showing cell state and hidden state values, (2) Cell State Connections (blue horizontal lines) - connecting cell states C(t) across time steps (the "highway"), (3) Hidden State Connections (brown horizontal lines) - connecting hidden states h(t) across time steps, (4) Input Neurons (green circles) - x(t) for each time step, (5) Output Neurons (orange circles) - y(t) for each time step, (6) Labels - C(t), h(t), y(t) labels with time indices. This view clearly shows the dual-state nature of LSTMs: Cell State (long-term memory) and Hidden State (short-term memory/output).
Time Series Graph Canvas: Canvas displaying the time series visualization showing lines plotted against time steps: (1) Input (green solid) - the input sequence x(t), (2) Cell State (blue solid) - the cell state C(t) showing long-term memory, (3) Hidden State (purple/brown dashed) - the hidden state h(t) showing filtered output, (4) Forget Gate (yellow line) - forget gate activation f(t), (5) Input Gate (cyan line) - input gate activation i(t), (6) Output Gate (magenta line) - output gate activation o(t), (7) Output (orange dotted) - the output y(t) = W_out × h(t), (8) Target (white dashed, training only) - the target sequence (desired output), only visible during training mode. The X-axis represents time steps (0, 1, 2, ...), and the Y-axis represents values. The graph demonstrates how the cell state maintains long-term memory (persists across many steps) and how the hidden state provides filtered output (controlled by output gate). When training is active, a Training Status Overlay appears displaying: Epoch count, Iteration count, and Loss (MSE) value. When you step through a sequence, each new point is plotted, building up the time series over time.
Math Panel: A text display panel that shows the exact mathematical calculations for the current step. Displays: (1) All gate equations with actual values (Forget Gate f(t), Input Gate i(t), Cell Candidate C̃(t), Output Gate o(t)), (2) Cell state update equation C(t) = f(t) × C(t-1) + i(t) × C̃(t) with actual values, (3) Hidden state equation h(t) = o(t) × tanh(C(t)) with actual values, (4) Output equation y(t) = W_out × h(t) with actual values, (5) Intermediate calculations for each gate and state. Located above or below the graphs. This educational feature makes the mathematical process transparent, helping users understand exactly how the LSTM calculates all gates and states at each step. The Math Panel updates whenever you click Step Forward or when Play mode advances to a new step.
Real-Time Statistics Display: Text overlay displaying current values in real-time: Time Step t, Input x(t), Cell State C(t), Hidden State h(t), Forget Gate f(t), Input Gate i(t), Output Gate o(t), Output y(t), and Loss (MSE, during training). Located in the control panel. The statistics update continuously as you step through the sequence, showing the current state of the system. Uses Courier New font with bright text on dark background for visibility.

Key Concepts

Long Short-Term Memory (LSTM): An advanced type of recurrent neural network architecture designed to overcome the vanishing gradient problem of basic RNNs. LSTMs maintain both a cell state (long-term memory) and a hidden state (short-term memory/output) through specialized gating mechanisms. Unlike basic RNNs, LSTMs use forget gates, input gates, and output gates to selectively manage information flow, allowing them to maintain long-term dependencies over hundreds of time steps. LSTMs are used for tasks like language modeling, machine translation, speech recognition, time series prediction, and sequence classification where long-term context is important.
LSTM Cell: The fundamental building block of LSTM networks, consisting of a cell state (long-term memory), a hidden state (short-term memory/output), and three gates (forget gate, input gate, output gate). In this simulation, we use a simplified LSTM: 1 input, 1 LSTM cell with gates, and 1 output. This architecture makes it easy to understand the fundamental concepts of LSTMs: how the cell state maintains long-term memory, how the gates control information flow, how the hidden state provides filtered output, and how the network processes sequences. Despite its simplicity, this LSTM demonstrates all the key principles: selective memory management through gating, long-term dependency handling, and sophisticated information flow control.
Cell State (Long-Term Memory): The cell state C(t) is the "highway" for long-term memory in LSTMs. It is updated at each time step according to C(t) = f(t) × C(t-1) + i(t) × C̃(t), where the forget gate f(t) controls what to keep from the previous cell state, and the input gate i(t) controls what new information to add. The cell state can maintain information across many time steps, making it ideal for long-term dependencies. Unlike the hidden state which changes at each time step, the cell state flows relatively unchanged (controlled by the forget gate), allowing information to persist. The cell state is visualized as a blue horizontal line (the "highway") in the network diagram and as a blue solid line in the Time Series Graph.
Hidden State (Short-Term Memory/Output): The hidden state h(t) is the filtered output of the LSTM, calculated as h(t) = o(t) × tanh(C(t)). The output gate o(t) controls what parts of the cell state to expose as the hidden state. Unlike the cell state which maintains long-term memory, the hidden state changes at each time step and provides the actual output of the LSTM cell. The hidden state is visualized in the network diagram and as a purple/brown dashed line in the Time Series Graph. The separation between cell state (long-term memory) and hidden state (short-term memory/output) is what allows LSTMs to maintain long-term dependencies while providing filtered output.
Gating Mechanism: LSTMs use gates (sigmoid-activated functions) to selectively manage information flow. The Forget Gate f(t) decides what to remove from the cell state, the Input Gate i(t) decides what new information to add, and the Output Gate o(t) decides what to expose as the hidden state. These gates work together to create sophisticated memory management: the forget gate removes irrelevant information, the input gate adds new relevant information, and the output gate filters what to expose. This gating mechanism is what distinguishes LSTMs from basic RNNs and allows them to maintain long-term dependencies.
Tanh Activation: The hyperbolic tangent activation function, used in the hidden state update: h(t) = tanh(...). Tanh squashes values to the range [-1, 1], preventing unbounded growth of the hidden state. This is important for RNN stability - without activation, the hidden state could grow indefinitely through the recurrent connection. Tanh is commonly used in RNNs because it: (1) Keeps values bounded, (2) Is differentiable everywhere, (3) Has a symmetric output range. The tanh function ensures that the hidden state stays within reasonable bounds, making the network stable and predictable.
Temporal Dependencies: Patterns in sequential data where the value at time t depends on values at previous time steps. For example, in language modeling, the word "mat" is more likely after "The cat sat on the..." because it depends on the previous context. RNNs are designed to handle temporal dependencies by maintaining a hidden state that carries information from previous inputs. The recurrent connection allows the network to "remember" previous inputs and use that information when processing the current input. This makes RNNs ideal for tasks involving sequences with dependencies, such as language modeling, time series prediction, and sequence classification.
Sequence Processing: RNNs process sequences one element at a time, updating their hidden state at each time step. The network reads the first input x(0), updates h(0), then reads x(1), updates h(1) (which depends on both x(1) and h(0)), and so on. This sequential processing allows RNNs to handle variable-length sequences and maintain context across time steps. The simulation demonstrates this by processing an input sequence step by step, showing how the hidden state evolves as each input is processed. The Step Forward button advances one time step at a time, making the sequential processing explicit and observable.
Memory Decay: When the input is zero, the hidden state decays according to h(t) = tanh(W_hidden × h(t-1) + b). The decay rate depends on W_hidden: large values cause slow decay (memory persists), small values cause fast decay (memory fades quickly). This is visualized in the Time Series Graph: after an input pulse (e.g., "1, 0, 0, 0, 0"), the Hidden State line decays gradually when W_hidden is high, or quickly when W_hidden is low. The Hidden neuron's color intensity also reflects this: it stays "lit" longer with high W_hidden, and "fades" quickly with low W_hidden. Memory decay is the mechanism by which RNNs forget old information and make room for new information.
Initial State: The hidden state is initialized to h(0) = 0 at the beginning of the sequence. This is a common initialization for RNNs, representing "no memory" at the start. As the sequence is processed, the hidden state accumulates information from the inputs. When you click Reset, the hidden state returns to h(0) = 0, and the simulation starts over. The initial state affects how the network processes the first few inputs, but its influence fades as more inputs are processed (depending on W_hidden).
What to Look For: When exploring the simulation, observe: (1) How the Hidden neuron's color intensity reflects the hidden state value (bright when h(t) is large, dim when h(t) is small), (2) How the Hidden State line in the Time Series Graph shows memory decay when inputs are zero, (3) How W_hidden controls memory retention (try resetting and replaying "1, 0, 0, 0, 0" with different W_hidden values), (4) How the Math Panel shows the exact calculations at each step, making the mathematical process transparent, (5) How the recurrent loop in the network diagram visually represents the feedback connection that creates memory. This is the moment "math" becomes "memory" - the recurrent connection creates a feedback loop that allows the network to remember and process sequences with temporal dependencies.

Big picture: the gating mechanism creates a sophisticated memory system that can selectively remember, forget, and output information over long sequences, overcoming the vanishing-gradient problem of basic RNNs. The network maintains both long-term memory (cell state) and short-term memory (hidden state) automatically. Real-world systems stack many LSTM layers with many cells, but the core principle is identical: gates manage memory so the network can capture long-term temporal dependencies.

Limitations

Single scalar cell. One input, one LSTM cell, one output, all scalar. Real LSTMs use vector states and weight matrices with hundreds of units per layer, so this demo shows the mechanism, not realistic capacity.
No training. Weights and biases are set by sliders, not learned by backpropagation-through-time. There is no dataset, loss, or gradient — you cannot watch the network actually learn a task.
Forward pass only. Because there is no BPTT, the vanishing-gradient problem the LSTM is famous for solving is described but not demonstrated numerically.
Single layer, no stacking. No deep/stacked LSTMs, bidirectionality, peephole connections, or modern variants (GRU, attention/Transformers).
Short toy sequences. Inputs are short hand-entered 0/1 sequences; long-range dependencies over hundreds of steps (where LSTMs shine) are not exercised.
Idealized arithmetic. Computations are exact double-precision with no quantization, dropout, or regularization present in real deployments.