Why your walk-forward results look different every time you run
Non-determinism, seeds, floating-point drift, and pipeline differences that change walk-forward outputs; how reproducibility works in engine-backed workflows.
Walk-forward analysis non-deterministic behavior and reproducible walk-forward results are paired searches. If your walk-forward numbers move between runs, do not panic immediately, but do classify the cause.
Benign causes
- Floating-point ordering differences
- Parallel optimization without fixed seeds
- Library updates changing numerics
Serious causes
- Non-deterministic data ordering
- Lookahead leakage that changes when window boundaries shift (Look-ahead bias)
What to freeze
- Data revision and timestamps
- Random seeds for any optimization inside IS
- Versions of Freqtrade, Python, and analysis libraries
How to debug differences systematically
If two runs disagree, diff the inputs first: candle files, pair list, fee table, and strategy config hash. Then diff the window schedule (same start/end, same step). Only after inputs match should you suspect numerical drift.
Small differences vs big differences
Tiny differences in the last decimal of a ratio are often benign. Large jumps in WFE, retention, or profit usually mean the pipeline changed, not that the market changed between yesterday and today.
Reproducibility checklist for teams
Write a one-page runbook: command lines, environment variables, pinned dependency versions, and where raw data lives. Reproducibility is how you convert debugging from art into engineering.