Why your walk-forward results look different every time you run
Non-determinism, seeds, floating-point drift, and pipeline differences that change walk-forward outputs; how reproducibility works in engine-backed workflows.
Walk-forward analysis non-deterministic behavior and reproducible walk-forward results are paired searches. If your walk-forward numbers move between runs, do not panic immediately, but do classify the cause.
Benign causes
- Floating-point ordering differences
- Parallel optimization without fixed seeds
- Library updates changing numerics
Serious causes
- Non-deterministic data ordering
- Lookahead leakage that changes when window boundaries shift (Look-ahead bias)
What to freeze
- Data revision and timestamps
- Random seeds for any optimization inside IS
- Versions of Freqtrade, Python, and analysis libraries
How to debug differences systematically
If two runs disagree, diff the inputs first: candle files, pair list, fee table, and strategy config hash. Then diff the window schedule (same start/end, same step). Only after inputs match should you suspect numerical drift.
Small differences vs big differences
Tiny differences in the last decimal of a ratio are often benign. Large jumps in WFE, retention, or profit usually mean the pipeline changed, not that the market changed between yesterday and today.
Reproducibility checklist for teams
Write a one-page runbook: command lines, environment variables, pinned dependency versions, and where raw data lives. Reproducibility is how you convert debugging from art into engineering.
Container images and hashes
If you run walk-forward in Docker, record:
- image digest
- build arguments
- mount paths for data volumes
This removes an entire class of "works on my machine" failures.
Deterministic parallelism
If you parallelize optimization, use a scheduler that preserves reproducible ordering of accepted candidates, or fix seeds per shard.
Race conditions in "pick the best" logic are more common than most teams admit.
Wall-clock leakage in window builders
If your code uses "today" to clip the last window, two runs on different calendar days will silently shift the schedule.
Prefer explicit end timestamps chosen up front, then freeze them in the run artifact.