🎯 Hidden Markov Model (HMM) Framework

Understanding Regime Detection with Expectation-Maximization

📋 Overview

Goal: Detect different "regimes" (states) in a time series where each regime has different statistical properties.

Example: Model accuracy time series might have:
  • HIGH regime: Mean accuracy ≈ 0.83
  • MED regime: Mean accuracy ≈ 0.65
  • LOW regime: Mean accuracy ≈ 0.46

The challenge: We don't observe which regime each time point belongs to - it's "hidden"! We only see the accuracy values.

❓ The Problem Setup

What We Have:

  • Time series of values: [0.82, 0.85, 0.83, 0.45, 0.48, 0.44, 0.81, 0.84, ...]
  • We want to find n regimes (e.g., n=3 for HIGH, MED, LOW)

What We Need to Learn:

Parameter Meaning
μ (mu) Mean value for each regime
σ² (sigma²) Variance for each regime (how spread out)
Transitions Probability of switching from one regime to another

🔄 The EM Algorithm (How It Works)

1️⃣ INITIALIZE
(Random)
2️⃣ E-STEP
(Calculate Probs)
3️⃣ M-STEP
(Update Params)
4️⃣ REPEAT
(Until Converge)

Step 1: Initialization

1 Start with random guesses for all parameters:

  • Random μ for each regime (e.g., μ_HIGH = 0.75, μ_MED = 0.60, μ_LOW = 0.50)
  • Random σ² for each regime
  • Random transition probabilities
model = hmm.GaussianHMM( n_components=3, # 3 regimes random_state=42 # Reproducible random init )

Step 2: E-Step (Expectation)

2 Calculate probabilities that each value belongs to each regime:

P(value | regime) = (1 / √(2πσ²)) × exp(-(value - μ)² / (2σ²))
Example: For value = 0.82
  • P(0.82 | HIGH with μ=0.80, σ²=0.01) = 3.90 (close to mean → high!)
  • P(0.82 | MED with μ=0.65, σ²=0.02) = 1.38 (medium distance)
  • P(0.82 | LOW with μ=0.45, σ²=0.01) = 0.004 (far from mean → low!)
Normalize: Total = 3.90 + 1.38 + 0.004 = 5.28
  • P(HIGH | 0.82) = 3.90 / 5.28 = 74%
  • P(MED | 0.82) = 1.38 / 5.28 = 26%
  • P(LOW | 0.82) = 0.004 / 5.28 = 0.1%

Do this for every value in the time series!

Step 3: M-Step (Maximization)

3 Update parameters using the probabilities from E-step:

μ_HIGH = Σ(value × P(HIGH|value)) / Σ(P(HIGH|value))
Example: Update μ_HIGH using 5 values:
Time 0 (0.82): 40% HIGH → contributes 0.82 × 0.40 = 0.328
Time 1 (0.85): 50% HIGH → contributes 0.85 × 0.50 = 0.425
Time 2 (0.45): 10% HIGH → contributes 0.45 × 0.10 = 0.045
Time 3 (0.48): 20% HIGH → contributes 0.48 × 0.20 = 0.096
Time 4 (0.81): 60% HIGH → contributes 0.81 × 0.60 = 0.486

μ_HIGH = (0.328 + 0.425 + 0.045 + 0.096 + 0.486) / (0.40 + 0.50 + 0.10 + 0.20 + 0.60)
       = 1.38 / 1.80 = 0.767
                        

Similarly update σ² and transition probabilities.

Step 4: Repeat Until Convergence

4 Go back to E-step with the new parameters and repeat:

  • Early iterations: Big changes in probabilities
  • Later iterations: Smaller and smaller changes
  • Convergence: Probabilities stabilize (barely change)
✅ After ~100 iterations (or convergence), you have:
  • Final stable μ, σ² for each regime
  • Final transition probabilities
  • Sharp probability assignments like: Time 0 is 98% HIGH, Time 3 is 95% LOW

📊 Model Selection with BIC

How do we know if 3 regimes is better than 2 or 4? Use BIC (Bayesian Information Criterion)!

BIC = -2 × log_likelihood + n_params × log(n_samples)

Two Parts:

Part What It Measures Effect
-2 × log_likelihood How well model fits data Lower = better fit (good!)
n_params × log(n_samples) Model complexity More params = higher penalty (bad!)
Example: Compare different n values
  • n=2: BIC=150 (underfits, too simple)
  • n=3: BIC=124 ✅ BEST! (good balance)
  • n=4: BIC=135 (overfits, too complex)
  • n=5: BIC=148 (definitely overfitting)
Pick n=3 (lowest BIC)

✅ Validation Checks

Even with good BIC, the solution might be garbage! Check:

Check 1: Too Many Regime Switches?

If regimes switch more than 10% of the time → INVALID

Bad: 100 time points with 50 switches = model is just chasing noise!
Good: 100 time points with 5 switches = stable regimes

Check 2: Are Regimes Distinct?

If two regime means are closer than 0.05 → INVALID

Bad: μ_HIGH = 0.820, μ_MED = 0.825 (only 0.005 apart!)
These are basically the same regime!
Good: μ_HIGH = 0.83, μ_MED = 0.65, μ_LOW = 0.46 (well separated)

💻 Code Mapping

Where Each Step Happens:

🔹 Initialization (Random Start)

model = hmm.GaussianHMM(n_components=n, random_state=42) # Creates model with random μ, σ², transitions

🔹 EM Loop (E-step + M-step repeated)

model.fit(accuracies) # *** THIS IS WHERE THE ENTIRE EM ALGORITHM RUNS *** # Internally alternates E-step and M-step until convergence

🔹 Calculate BIC

log_likelihood = model.score(accuracies) * len(accuracies) n_params = n**2 + 2*n bic = -2 * log_likelihood + n_params * np.log(len(accuracies))

🔹 Detect Regimes (After Fitting)

regime_probs = model.predict_proba(accuracies) # Uses fitted μ, σ² to calculate P(regime | value) # Returns: [[0.95, 0.03, 0.02], [0.97, 0.02, 0.01], ...] regimes = np.argmax(regime_probs, axis=1) # Convert to hard labels: [0, 0, 2, 2, ...]

🎯 Summary

Start
Time Series
For n=2,3,4,5,6
Random Init
EM Loop
E-step + M-step
Validate
Check switches & distinctness
Select Best
Lowest BIC (valid)
Key Takeaways:
  • HMM learns hidden regime structure from data
  • EM algorithm alternates between E-step (calculate probs) and M-step (update params)
  • BIC helps choose the right number of regimes
  • Validation ensures solutions are meaningful, not garbage
  • Final model can classify new data into regimes