⚡ Performance Optimization Deep Dive
The Top 3 Performance Optimizations
These optimizations transformed the backtesting system from taking hours to minutes, representing 100-1000x speedups in critical operations.
1. Multi-Index Lookup: O(n) → O(1)
The Problem:
Every time you need to find options for a specific entry date and expiration date, the original code was scanning through potentially millions of rows checking if each row matches both dates.
Before (v2):
mask = (date_arr == entry_day) & (exdate_arr == expiry_day) local_options = options.loc[mask]
This creates a boolean mask by checking every single row - if you have 5 million options contracts, you're doing 5 million comparisons every lookup.
After (v3):
lookup_key = (change_to_categorical(entry_day), change_to_categorical(expiry_day))
if lookup_key in options_indexed.index:
local_options = options_indexed.loc[lookup_key].reset_index()
What Changed:
You pre-built a multi-index on ['date', 'exdate']. Now pandas maintains a hash table internally. When you lookup (entry_day, expiry_day), it's like using a Python dictionary - instant access.
- In backtesting, you're doing this lookup potentially thousands of times
- Linear scan: 1000 lookups × 5M rows = 5 billion comparisons
- Index lookup: 1000 lookups × O(1) = 1000 hash lookups
- This is often 100-1000x faster in practice
The Catch:
You need to build the index once upfront. But if you're doing repeated lookups (which you are in backtesting), it pays for itself immediately.
2. Merge vs isin(): O(n×m) → O(n+m)
The Problem:
You have a large options dataset (n rows) and need to filter it to only symbols you're trading (m symbols). .isin() has quadratic-ish behavior.
Before (v3_2):
filtered_options = _options[_options['symbol'].isin(all_syms)]
What .isin() does internally:
For each of the n rows in _options, it checks if that symbol exists in all_syms (m items). Worst case: n × m comparisons.
- 5M options rows
- 500 unique symbols you trade
- 5M × 500 = 2.5 billion comparisons
After (v3_3):
all_syms_df = pd.DataFrame({'symbol': all_syms})
filtered_options = _options.merge(all_syms_df, on='symbol', how='inner')
What .merge() does internally:
Pandas uses a hash join algorithm:
- Build hash table from smaller dataset (all_syms_df) - O(m)
- Probe hash table for each row in larger dataset (_options) - O(n)
- Total: O(n + m)
- 5M + 500 = ~5M operations (basically just scanning options once)
The gap widens dramatically as your data grows:
- Small data (10K rows, 50 symbols): maybe 2x faster
- Medium data (1M rows, 200 symbols): 10-20x faster
- Large data (5M rows, 500 symbols): 50x faster
Real-world impact:
If filtering was taking 30 seconds with .isin(), it now takes 0.5 seconds with .merge().
3. Vectorization: Python Loop → Pandas Batch Operation
The Problem:
You need to set a flag to 0 for specific dates. Python loops are notoriously slow because of interpreter overhead.
Before (v1):
business_days = pd.date_range(start=start_date, end=end_date, freq=BDay())
for day in business_days:
if day in opt['cond_exit']:
dict_options[o].at[day, 'nopnl'] = 0
What's expensive here:
- Python
forloop: interpreter overhead for each iteration .at[day, 'nopnl']: individual cell access, no optimizationif day in opt['cond_exit']: checking membership in each iteration
If you have 250 business days, that's 250 Python-level operations, each with full overhead.
After (v2):
business_days = pd.date_range(start=start_date, end=end_date, freq=BDay()) cond_exit_days = opt['cond_exit'].index update_days = business_days.intersection(cond_exit_days) dict_options[o].loc[update_days, 'nopnl'] = 0
What changed:
.intersection(): Set operation done in C code (pandas/numpy), not Python.loc[update_days, 'nopnl'] = 0: Single vectorized assignment- Pandas identifies all rows matching
update_daysin one pass - Updates entire column slice in optimized C code
- Memory is written contiguously, cache-friendly
- Pandas identifies all rows matching
- No Python interpreter overhead per iteration
- Single function call instead of 250+ function calls
- Pandas can optimize memory access patterns
- Typically 5-10x faster, sometimes more
The principle:
Whenever you see a loop modifying a DataFrame, ask: "Can I identify all the rows I need to modify, then update them all at once?" That's vectorization.