|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hidden Markov Model (HMM) - Interactive TutorialA Hidden Markov Model (HMM) is a statistical model that extends the basic Markov Chain by introducing the concept of hidden states and observable emissions. While in a standard Markov Chain we can directly observe the state of the system, in an HMM the true state is hidden from us—we can only observe outputs (emissions) that depend probabilistically on the hidden state. 1. Mathematical Foundation1.1 Components of an HMMAn HMM is formally defined by the tuple λ = (A, B, π) where:
1.2 The Transition Matrix (A) — Hidden → HiddenThe transition matrix A describes the probability of moving from one hidden state to another hidden state: Key Point: The Transition Matrix operates entirely within the hidden layer. It determines how the system moves between hidden states that we cannot directly observe.
A = | a00 a01 ... a0N | where Σj aij = 1 for all i
| a10 a11 ... a1N |
| ... ... ... ... |
| aN0 aN1 ... aNN |
aij = P(hidden state j at t+1 | hidden state i at t)
1.3 The Emission Matrix (B) — Hidden → ObservedThe emission matrix B describes the probability of observing a particular output given the current hidden state: Key Point: The Emission Matrix is the bridge between the hidden layer and the observable world. It determines what we actually see based on which hidden state the system is in.
B = | b0(O0) b0(O1) ... b0(OM) | where Σk bi(k) = 1 for all i
| b1(O0) b1(O1) ... b1(OM) |
| ... ... ... ... |
| bN(O0) bN(O1) ... bN(OM) |
bi(k) = P(observe Ok | in hidden state Si)
1.4 Information Flow DiagramThe following diagram illustrates how information flows in an HMM: HIDDEN LAYER (Unobservable)
S0
⟷
S1
⟷
S2
← Transition Matrix (A) controls these →
↓
↓
↓
Emission Matrix (B) controls these
↓
↓
↓
OBSERVABLE LAYER (What We See)
O0
O1
O2
Summary:
We can only observe the bottom row (O0, O1, O2...). The challenge of HMM is to infer the top row (S0, S1, S2...) from the observations! 2. The Three Fundamental Problems of HMMs2.1 Problem 1: Evaluation (Forward Algorithm)Given: An observation sequence O = (o1, o2, ..., oT) and model λ Find: P(O|λ) - the probability of the observation sequence given the model The Forward Algorithm computes this efficiently using dynamic programming: Initialization: α1(i) = πi · bi(o1) Recursion: αt+1(j) = [Σi αt(i) · aij] · bj(ot+1) Termination: P(O|λ) = Σi αT(i) 2.2 Problem 2: Decoding (Viterbi Algorithm)Given: An observation sequence O and model λ Find: The most likely hidden state sequence Q* = argmaxQ P(Q|O,λ) The Viterbi Algorithm finds the optimal path through the hidden states: Initialization: δ1(i) = πi · bi(o1), ψ1(i) = 0 Recursion: δt(j) = maxi[δt-1(i) · aij] · bj(ot) ψt(j) = argmaxi[δt-1(i) · aij] Termination: P* = maxi[δT(i)], qT* = argmaxi[δT(i)] Backtracking: qt* = ψt+1(qt+1*) 2.3 Problem 3: Learning (Baum-Welch Algorithm)Given: An observation sequence O Find: Model parameters λ = (A, B, π) that maximize P(O|λ) The Baum-Welch algorithm (a special case of EM algorithm) iteratively updates the model parameters to maximize the likelihood. This is not demonstrated in this simulation but is crucial for training HMMs from data. 3. How to Construct/Estimate the MatricesA critical question in HMM applications is: How do we determine the Transition Matrix (A) and Emission Matrix (B)? There are three main approaches: 3.1 Manual Specification (Domain Knowledge)If you have expert knowledge about the domain, you can manually set the matrix values: Example: Weather Model
Pros: Simple, interpretable, no training data needed 3.2 Counting from Labeled Data (Supervised Learning)If you have both observation sequences AND the corresponding true hidden state sequences (labeled data), you can directly count transitions and emissions:
Transition Matrix:
aij = Count(state i → state j) / Count(state i)
Emission Matrix:
bi(k) = Count(state i emits Ok) / Count(state i)
Initial Distribution:
πi = Count(sequences starting with state i) / Total sequences
Example Calculation: Given labeled sequence: S0→S0→S1→S1→S0
Pros: Simple, directly computable, statistically principled 3.3 Baum-Welch Algorithm (Unsupervised Learning)When you only have observation sequences (no hidden state labels), the Baum-Welch algorithm can estimate the parameters iteratively: Algorithm Overview (Expectation-Maximization):
Baum-Welch Iteration Flow:
Initialize
λ(0) →
E-step
(expected counts) →
M-step
(update λ) →
Converged?
↑_______________ No, repeat _______________|
Pros: No labeled data required, finds locally optimal parameters 3.4 Comparison of Methods
� ️ Practical Tips:
4. Key Differences: HMM vs Markov Chain
5. Applications of HMMs
50%
Hidden State Diagram (S0, S1, S2 = Hidden States)
Transition Matrix (A) controls arrows between these nodes
Initial Distribution (π)
Starting hidden state probabilities
Transition Matrix (A)
Hidden → Hidden (row = from, col = to)
Emission Diagram (Hidden → Observed)
Visual: Hidden states emit observations
Emission Matrix (B)
P(observation | hidden state)
Observation Sequence (What We See)
True Hidden State Sequence
Current Hidden State:
S0
Total Steps:
0
Hidden State Visit Distribution:
Observation Visit Distribution:
HMM Process Visualization
Click Step ▶ to see HMM process
Viterbi Algorithm - Most Likely Hidden State Sequence
Forward Algorithm - Observation Probability
P(O|λ) = --
Hidden State Visit Trajectory
State 0
State 1
State 2
Obs 0
Obs 1
Obs 2
6. Understanding the Simulation6.1 The State DiagramThe circular nodes in the diagram represent hidden states. Unlike a regular Markov Chain where you can observe the state directly, in an HMM these states are not visible to an observer—hence "hidden". The arrows show transition probabilities between states. Visual Elements:
6.2 The Three MatricesInitial Distribution (π): The probability of starting in each hidden state. When you click "Reset", the system randomly selects an initial hidden state based on this distribution. Transition Matrix (A) — Hidden → Hidden:
Each row shows the probability of transitioning from the current hidden state to any other hidden state.
Row i, Column j gives P(next hidden state = Sj | current hidden state = Si) This matrix controls the arrows between circular nodes in the diagram. Emission Matrix (B) — Hidden → Observed:
Each row shows the probability of emitting (observing) each observation given the current hidden state.
Row i, Column k gives P(observe Ok | in hidden state Si) This matrix determines what observation appears in the "Observation Sequence" panel.
� ️ Important Distinction: In the simulation, S0, S1, S2 are HIDDEN states (shown in the state diagram but NOT directly observable in a real scenario). O0, O1, O2 are OBSERVATIONS (shown in the "Observation Sequence" panel — this is all a real observer would see).
6.3 The HMM ProcessEach simulation step follows this process:
One Step Visualization:
Si (current)
hidden
—A→
Sj (next)
hidden
—B→
Ok
observed!
6.4 The Viterbi DisplayThe Viterbi panel shows the δ (delta) values computed by the Viterbi algorithm at each time step. The highlighted cells indicate the most likely path through the hidden states. This is the algorithm's best guess of what the hidden states were, given only the observation sequence. Interpretation: The "Most Likely Path" shown at the bottom is the Viterbi algorithm's reconstruction of the hidden state sequence. Compare it with the "True Hidden State Sequence" to see how well the algorithm performs. 6.5 The Forward ProbabilityThe Forward Algorithm computes P(O|λ), the probability of observing the entire observation sequence given the model. This value typically becomes very small as the sequence grows (exponentially small), which is why it's displayed in scientific notation. 7. Usage Guide7.1 Controls
7.2 Presets Explained
8. Experiments to Try8.1 Observe Viterbi AccuracyRun the simulation and compare the "True Hidden State Sequence" with the "Most Likely Path" from Viterbi. Notice how the algorithm often correctly reconstructs the hidden states, but sometimes makes errors, especially when emission probabilities are similar across states. 8.2 Strong vs Weak EmissionsModify the Emission Matrix to make observations more distinctive (e.g., state 0 only emits O0, state 1 only emits O1). Observe how the Viterbi algorithm becomes nearly perfect when emissions are unambiguous. 8.3 Forward Probability DecayWatch the Forward Probability P(O|λ) as the observation sequence grows. Notice how it decreases exponentially. This is why HMMs often use log probabilities in practice. 8.4 Casino DetectionLoad the Casino preset. The loaded die favors rolling 6. Watch how the Viterbi algorithm tries to detect when the casino is using the loaded die based on the sequence of rolls. 9. Mathematical Details9.1 Complexity Analysis
Where N = number of states, T = sequence length 9.2 Numerical ConsiderationsIn practice, HMM implementations use log probabilities to avoid numerical underflow. The simulation uses raw probabilities for clarity but may show very small values for long sequences. 9.3 ScalingThe Forward algorithm can use scaling factors ct = 1/Σiαt(i) to normalize α values at each step, preventing underflow while still allowing computation of P(O|λ) = � t(1/ct). 10. Limitations and ExtensionsCurrent Simulation Limitations:
Extensions to Explore:
|