⚡ Performance Optimization Deep Dive

How the backtesting system went from hours to minutes — 100–1000x speedups

⚡ Performance Optimization Deep Dive

The Top 3 Performance Optimizations

These optimizations transformed the backtesting system from taking hours to minutes, representing 100-1000x speedups in critical operations.

1. Multi-Index Lookup: O(n) → O(1)

The Problem:

Every time you need to find options for a specific entry date and expiration date, the original code was scanning through potentially millions of rows checking if each row matches both dates.

Before (v2):

mask = (date_arr == entry_day) & (exdate_arr == expiry_day)
local_options = options.loc[mask]

This creates a boolean mask by checking every single row - if you have 5 million options contracts, you're doing 5 million comparisons every lookup.

After (v3):

lookup_key = (change_to_categorical(entry_day), change_to_categorical(expiry_day))
if lookup_key in options_indexed.index:
    local_options = options_indexed.loc[lookup_key].reset_index()

What Changed:

You pre-built a multi-index on ['date', 'exdate']. Now pandas maintains a hash table internally. When you lookup (entry_day, expiry_day), it's like using a Python dictionary - instant access.

Why It's Huge:
  • In backtesting, you're doing this lookup potentially thousands of times
  • Linear scan: 1000 lookups × 5M rows = 5 billion comparisons
  • Index lookup: 1000 lookups × O(1) = 1000 hash lookups
  • This is often 100-1000x faster in practice

The Catch:

You need to build the index once upfront. But if you're doing repeated lookups (which you are in backtesting), it pays for itself immediately.

2. Merge vs isin(): O(n×m) → O(n+m)

The Problem:

You have a large options dataset (n rows) and need to filter it to only symbols you're trading (m symbols). .isin() has quadratic-ish behavior.

Before (v3_2):

filtered_options = _options[_options['symbol'].isin(all_syms)]

What .isin() does internally:

For each of the n rows in _options, it checks if that symbol exists in all_syms (m items). Worst case: n × m comparisons.

Example:
  • 5M options rows
  • 500 unique symbols you trade
  • 5M × 500 = 2.5 billion comparisons

After (v3_3):

all_syms_df = pd.DataFrame({'symbol': all_syms})
filtered_options = _options.merge(all_syms_df, on='symbol', how='inner')

What .merge() does internally:

Pandas uses a hash join algorithm:

  1. Build hash table from smaller dataset (all_syms_df) - O(m)
  2. Probe hash table for each row in larger dataset (_options) - O(n)
  3. Total: O(n + m)
Same example:
  • 5M + 500 = ~5M operations (basically just scanning options once)
Why It's Huge:

The gap widens dramatically as your data grows:

  • Small data (10K rows, 50 symbols): maybe 2x faster
  • Medium data (1M rows, 200 symbols): 10-20x faster
  • Large data (5M rows, 500 symbols): 50x faster

Real-world impact:

If filtering was taking 30 seconds with .isin(), it now takes 0.5 seconds with .merge().

3. Vectorization: Python Loop → Pandas Batch Operation

The Problem:

You need to set a flag to 0 for specific dates. Python loops are notoriously slow because of interpreter overhead.

Before (v1):

business_days = pd.date_range(start=start_date, end=end_date, freq=BDay())

for day in business_days:
    if day in opt['cond_exit']:
        dict_options[o].at[day, 'nopnl'] = 0

What's expensive here:

  • Python for loop: interpreter overhead for each iteration
  • .at[day, 'nopnl']: individual cell access, no optimization
  • if day in opt['cond_exit']: checking membership in each iteration

If you have 250 business days, that's 250 Python-level operations, each with full overhead.

After (v2):

business_days = pd.date_range(start=start_date, end=end_date, freq=BDay())
cond_exit_days = opt['cond_exit'].index

update_days = business_days.intersection(cond_exit_days)

dict_options[o].loc[update_days, 'nopnl'] = 0

What changed:

  1. .intersection(): Set operation done in C code (pandas/numpy), not Python
  2. .loc[update_days, 'nopnl'] = 0: Single vectorized assignment
    • Pandas identifies all rows matching update_days in one pass
    • Updates entire column slice in optimized C code
    • Memory is written contiguously, cache-friendly
Why It's Huge:
  • No Python interpreter overhead per iteration
  • Single function call instead of 250+ function calls
  • Pandas can optimize memory access patterns
  • Typically 5-10x faster, sometimes more

The principle:

Whenever you see a loop modifying a DataFrame, ask: "Can I identify all the rows I need to modify, then update them all at once?" That's vectorization.