Why Data Scaling Matters in Machine Learning Optimization
📊The Data: Two Faces of the Same Economic Story
We start with two CSV files containing Consumer Price Index (CPI) data - the same economic information presented in drastically different scales:
🔴 CPI Levels (CPIAUCSL.csv)
Range: 78.0 to 307.5
Scale: Large absolute numbers
Example: 250.1, 251.3, 252.8...
Problem: Creates numerical chaos
🟢 CPI Changes (CPIAUCSL_PCH.csv)
Range: -2.1% to +1.4%
Scale: Small percentage changes
Example: 0.3%, -0.1%, 0.7%...
Problem: Well-behaved optimization
💡 Key Insight: These datasets contain identical economic information! CPI changes are just the month-over-month percentage changes of CPI levels. Yet they create completely different optimization landscapes.
🔬The Mathematical Setup
We create an autoregressive time series regression for both datasets:
Watch how each optimizer handles the ill-conditioned CPI levels (left, red) versus the well-conditioned CPI changes (right, green). The animations show the optimization path in the β₁ vs β₂ parameter space:
This visualization demonstrates a fundamental truth in machine learning: the same information presented at different scales can create completely different optimization challenges. The CPI levels and changes represent identical economic data, yet one creates a numerical nightmare while the other optimizes smoothly.
The key takeaway isn't just about CPI data - it's about the critical importance of data preprocessing in any machine learning pipeline. Without proper scaling, even the most sophisticated optimization algorithms can fail on perfectly solvable problems.
🎪 The Real "Disaster": It's not the algorithms failing - it's what happens when we ignore the fundamental importance of data scaling in machine learning. The disaster is entirely preventable with proper preprocessing!