🚨 CPI Time Series Optimization Disaster 🚨

Why Data Scaling Matters in Machine Learning Optimization

📊The Data: Two Faces of the Same Economic Story

We start with two CSV files containing Consumer Price Index (CPI) data - the same economic information presented in drastically different scales:

🔴 CPI Levels (CPIAUCSL.csv)

Range: 78.0 to 307.5
Scale: Large absolute numbers
Example: 250.1, 251.3, 252.8...
Problem: Creates numerical chaos

🟢 CPI Changes (CPIAUCSL_PCH.csv)

Range: -2.1% to +1.4%
Scale: Small percentage changes
Example: 0.3%, -0.1%, 0.7%...
Problem: Well-behaved optimization

💡 Key Insight: These datasets contain identical economic information! CPI changes are just the month-over-month percentage changes of CPI levels. Yet they create completely different optimization landscapes.

🔬The Mathematical Setup

We create an autoregressive time series regression for both datasets:

Y_t = β₀ + β₁ × Y_t-1 + β₂ × trend + ε_t

Where:

Y_t: Current period value (CPI level or change)
Y_t-1: Previous period value (lag term)
β₀: Intercept
β₁: Autoregressive coefficient
β₂: Trend coefficient
trend: Linear time trend (1, 2, 3, ...)

The Objective Function

We minimize the Mean Squared Error (MSE):

Loss(β) = ½ × Σ(Y_t - Ŷ_t)²

⚠️The Numerical Problem

🔥 The Condition Number Disaster

The condition number of X'X reveals the problem:

CPI Levels: ~10¹⁶ (catastrophically ill-conditioned)
CPI Changes: ~10² (well-conditioned)

When the condition number is very large, small changes in input can cause massive changes in output. This happens because:

Scale differences: CPI levels (250) vs trend values (1, 2, 3...) create vastly different magnitudes
Matrix ill-conditioning: The design matrix X becomes nearly singular
Gradient explosion: Optimization algorithms receive conflicting signals

📈The Optimization Animations

Watch how each optimizer handles the ill-conditioned CPI levels (left, red) versus the well-conditioned CPI changes (right, green). The animations show the optimization path in the β₁ vs β₂ parameter space:

🥇 AdaGrad: The Consistent Champion

CPI Levels: Smooth descent toward distant optimum (β₁ ≈ 1.0)
CPI Changes: Perfect V-shaped convergence to the black star
Final Loss: 63,900 (levels) / 15.9 (changes)
Why it works: Conservative accumulated gradients prevent chaos

🔄 Adam: The Momentum Maverick

CPI Levels: Clean path but can't reach distant optimum
CPI Changes: Oscillatory dance around the target
Final Loss: 220,000 (levels) / 15.6 (changes) ⭐️
Why it struggles: Momentum causes overshoot on levels

🌪️ RMSprop: The Chaos Solver

CPI Levels: Diagonal descent finds excellent practical solution
CPI Changes: Wild oscillations but surprisingly effective
Final Loss: 39,200 (levels) ⭐️ / 674.4 (changes)
Why it wins: Finds good solutions without reaching theoretical optima

🎓The Broader Lessons

💼 For Machine Learning Practitioners:

Data scaling is not optional: Always standardize or normalize your features
Check condition numbers: Use np.linalg.cond() to detect ill-conditioning
Algorithm choice matters: Some optimizers handle ill-conditioning better than others
Learning rates need careful tuning: Different scales require different learning rates
Empirical testing is crucial: Theoretical expectations don't always match reality

🔧Practical Solutions

🛠️ How to Fix the CPI Levels Problem:

# Option 1: Standardization
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Option 2: Log transformation
Y_log = np.log(Y / Y.shift(1))  # Convert to log returns

# Option 3: Differencing
Y_diff = Y.diff()  # First differences

# Option 4: Min-Max scaling
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_normalized = scaler.fit_transform(X)
            

🎯The Bottom Line

This visualization demonstrates a fundamental truth in machine learning: the same information presented at different scales can create completely different optimization challenges. The CPI levels and changes represent identical economic data, yet one creates a numerical nightmare while the other optimizes smoothly.

The key takeaway isn't just about CPI data - it's about the critical importance of data preprocessing in any machine learning pipeline. Without proper scaling, even the most sophisticated optimization algorithms can fail on perfectly solvable problems.

🎪 The Real "Disaster": It's not the algorithms failing - it's what happens when we ignore the fundamental importance of data scaling in machine learning. The disaster is entirely preventable with proper preprocessing!