🚨 CPI Time Series Optimization Disaster 🚨

Why Data Scaling Matters in Machine Learning Optimization

📊The Data: Two Faces of the Same Economic Story

We start with two CSV files containing Consumer Price Index (CPI) data - the same economic information presented in drastically different scales:

🔴 CPI Levels (CPIAUCSL.csv)

  • Range: 78.0 to 307.5
  • Scale: Large absolute numbers
  • Example: 250.1, 251.3, 252.8...
  • Problem: Creates numerical chaos

🟢 CPI Changes (CPIAUCSL_PCH.csv)

  • Range: -2.1% to +1.4%
  • Scale: Small percentage changes
  • Example: 0.3%, -0.1%, 0.7%...
  • Problem: Well-behaved optimization
💡 Key Insight: These datasets contain identical economic information! CPI changes are just the month-over-month percentage changes of CPI levels. Yet they create completely different optimization landscapes.

🔬The Mathematical Setup

We create an autoregressive time series regression for both datasets:

Yt = β₀ + β₁ × Yt-1 + β₂ × trend + εt

Where:

The Objective Function

We minimize the Mean Squared Error (MSE):

Loss(β) = ½ × Σ(Yt - Ŷt

⚠️The Numerical Problem

🔥 The Condition Number Disaster

The condition number of X'X reveals the problem:

When the condition number is very large, small changes in input can cause massive changes in output. This happens because:

  1. Scale differences: CPI levels (250) vs trend values (1, 2, 3...) create vastly different magnitudes
  2. Matrix ill-conditioning: The design matrix X becomes nearly singular
  3. Gradient explosion: Optimization algorithms receive conflicting signals

📈The Optimization Animations

Watch how each optimizer handles the ill-conditioned CPI levels (left, red) versus the well-conditioned CPI changes (right, green). The animations show the optimization path in the β₁ vs β₂ parameter space:

🥇 AdaGrad: The Consistent Champion

  • CPI Levels: Smooth descent toward distant optimum (β₁ ≈ 1.0)
  • CPI Changes: Perfect V-shaped convergence to the black star
  • Final Loss: 63,900 (levels) / 15.9 (changes)
  • Why it works: Conservative accumulated gradients prevent chaos

🔄 Adam: The Momentum Maverick

  • CPI Levels: Clean path but can't reach distant optimum
  • CPI Changes: Oscillatory dance around the target
  • Final Loss: 220,000 (levels) / 15.6 (changes) ⭐️
  • Why it struggles: Momentum causes overshoot on levels

🌪️ RMSprop: The Chaos Solver

  • CPI Levels: Diagonal descent finds excellent practical solution
  • CPI Changes: Wild oscillations but surprisingly effective
  • Final Loss: 39,200 (levels) ⭐️ / 674.4 (changes)
  • Why it wins: Finds good solutions without reaching theoretical optima

🎓The Broader Lessons

💼 For Machine Learning Practitioners:

  1. Data scaling is not optional: Always standardize or normalize your features
  2. Check condition numbers: Use np.linalg.cond() to detect ill-conditioning
  3. Algorithm choice matters: Some optimizers handle ill-conditioning better than others
  4. Learning rates need careful tuning: Different scales require different learning rates
  5. Empirical testing is crucial: Theoretical expectations don't always match reality

🔧Practical Solutions

🛠️ How to Fix the CPI Levels Problem:

# Option 1: Standardization from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # Option 2: Log transformation Y_log = np.log(Y / Y.shift(1)) # Convert to log returns # Option 3: Differencing Y_diff = Y.diff() # First differences # Option 4: Min-Max scaling from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() X_normalized = scaler.fit_transform(X)

🎯The Bottom Line

This visualization demonstrates a fundamental truth in machine learning: the same information presented at different scales can create completely different optimization challenges. The CPI levels and changes represent identical economic data, yet one creates a numerical nightmare while the other optimizes smoothly.

The key takeaway isn't just about CPI data - it's about the critical importance of data preprocessing in any machine learning pipeline. Without proper scaling, even the most sophisticated optimization algorithms can fail on perfectly solvable problems.

🎪 The Real "Disaster": It's not the algorithms failing - it's what happens when we ignore the fundamental importance of data scaling in machine learning. The disaster is entirely preventable with proper preprocessing!