HMM Framework Explainer

📋 Overview

Goal: Detect different "regimes" (states) in a time series where each regime has different statistical properties.

Example: Model accuracy time series might have:

HIGH regime: Mean accuracy ≈ 0.83
MED regime: Mean accuracy ≈ 0.65
LOW regime: Mean accuracy ≈ 0.46

The challenge: We don't observe which regime each time point belongs to - it's "hidden"! We only see the accuracy values.

❓ The Problem Setup

What We Have:

Time series of values: [0.82, 0.85, 0.83, 0.45, 0.48, 0.44, 0.81, 0.84, ...]
We want to find n regimes (e.g., n=3 for HIGH, MED, LOW)

What We Need to Learn:

Parameter	Meaning
μ (mu)	Mean value for each regime
σ² (sigma²)	Variance for each regime (how spread out)
Transitions	Probability of switching from one regime to another

🔄 The EM Algorithm (How It Works)

1️⃣ INITIALIZE
(Random)

→

2️⃣ E-STEP
(Calculate Probs)

→

3️⃣ M-STEP
(Update Params)

→

4️⃣ REPEAT
(Until Converge)

Step 1: Initialization

1 Start with random guesses for all parameters:

Random μ for each regime (e.g., μ_HIGH = 0.75, μ_MED = 0.60, μ_LOW = 0.50)
Random σ² for each regime
Random transition probabilities

model = hmm.GaussianHMM( n_components=3, # 3 regimes random_state=42 # Reproducible random init )

Step 2: E-Step (Expectation)

2 Calculate probabilities that each value belongs to each regime:

P(value | regime) = (1 / √(2πσ²)) × exp(-(value - μ)² / (2σ²))

Example: For value = 0.82

P(0.82 | HIGH with μ=0.80, σ²=0.01) = 3.90 (close to mean → high!)
P(0.82 | MED with μ=0.65, σ²=0.02) = 1.38 (medium distance)
P(0.82 | LOW with μ=0.45, σ²=0.01) = 0.004 (far from mean → low!)

Normalize: Total = 3.90 + 1.38 + 0.004 = 5.28

P(HIGH | 0.82) = 3.90 / 5.28 = 74%
P(MED | 0.82) = 1.38 / 5.28 = 26%
P(LOW | 0.82) = 0.004 / 5.28 = 0.1%

Do this for every value in the time series!

Step 3: M-Step (Maximization)

3 Update parameters using the probabilities from E-step:

μ_HIGH = Σ(value × P(HIGH|value)) / Σ(P(HIGH|value))

Example: Update μ_HIGH using 5 values:

Time 0 (0.82): 40% HIGH → contributes 0.82 × 0.40 = 0.328
Time 1 (0.85): 50% HIGH → contributes 0.85 × 0.50 = 0.425
Time 2 (0.45): 10% HIGH → contributes 0.45 × 0.10 = 0.045
Time 3 (0.48): 20% HIGH → contributes 0.48 × 0.20 = 0.096
Time 4 (0.81): 60% HIGH → contributes 0.81 × 0.60 = 0.486

μ_HIGH = (0.328 + 0.425 + 0.045 + 0.096 + 0.486) / (0.40 + 0.50 + 0.10 + 0.20 + 0.60)
       = 1.38 / 1.80 = 0.767

Similarly update σ² and transition probabilities.

Step 4: Repeat Until Convergence

4 Go back to E-step with the new parameters and repeat:

Early iterations: Big changes in probabilities
Later iterations: Smaller and smaller changes
Convergence: Probabilities stabilize (barely change)

✅ After ~100 iterations (or convergence), you have:

Final stable μ, σ² for each regime
Final transition probabilities
Sharp probability assignments like: Time 0 is 98% HIGH, Time 3 is 95% LOW

📊 Model Selection with BIC

How do we know if 3 regimes is better than 2 or 4? Use BIC (Bayesian Information Criterion)!

BIC = -2 × log_likelihood + n_params × log(n_samples)

Two Parts:

Part	What It Measures	Effect
-2 × log_likelihood	How well model fits data	Lower = better fit (good!)
n_params × log(n_samples)	Model complexity	More params = higher penalty (bad!)

Example: Compare different n values

n=2: BIC=150 (underfits, too simple)
n=3: BIC=124 ✅ BEST! (good balance)
n=4: BIC=135 (overfits, too complex)
n=5: BIC=148 (definitely overfitting)

Pick n=3 (lowest BIC)

✅ Validation Checks

Even with good BIC, the solution might be garbage! Check:

Check 1: Too Many Regime Switches?

If regimes switch more than 10% of the time → INVALID

❌ Bad: 100 time points with 50 switches = model is just chasing noise!

✅ Good: 100 time points with 5 switches = stable regimes

Check 2: Are Regimes Distinct?

If two regime means are closer than 0.05 → INVALID

❌ Bad: μ_HIGH = 0.820, μ_MED = 0.825 (only 0.005 apart!)
These are basically the same regime!

✅ Good: μ_HIGH = 0.83, μ_MED = 0.65, μ_LOW = 0.46 (well separated)

💻 Code Mapping

Where Each Step Happens:

🔹 Initialization (Random Start)

model = hmm.GaussianHMM(n_components=n, random_state=42) # Creates model with random μ, σ², transitions

🔹 EM Loop (E-step + M-step repeated)

model.fit(accuracies) # *** THIS IS WHERE THE ENTIRE EM ALGORITHM RUNS *** # Internally alternates E-step and M-step until convergence

🔹 Calculate BIC

log_likelihood = model.score(accuracies) * len(accuracies) n_params = n**2 + 2*n bic = -2 * log_likelihood + n_params * np.log(len(accuracies))

🔹 Detect Regimes (After Fitting)

regime_probs = model.predict_proba(accuracies) # Uses fitted μ, σ² to calculate P(regime | value) # Returns: [[0.95, 0.03, 0.02], [0.97, 0.02, 0.01], ...] regimes = np.argmax(regime_probs, axis=1) # Convert to hard labels: [0, 0, 2, 2, ...]

🎯 Summary

Start
Time Series

→

For n=2,3,4,5,6
Random Init

→

EM Loop
E-step + M-step

→

Validate
Check switches & distinctness

→

Select Best
Lowest BIC (valid)

Key Takeaways:

HMM learns hidden regime structure from data
EM algorithm alternates between E-step (calculate probs) and M-step (update params)
BIC helps choose the right number of regimes
Validation ensures solutions are meaningful, not garbage
Final model can classify new data into regimes