Walk-forward analysis non-deterministic behavior and reproducible walk-forward results are paired searches. If your walk-forward numbers move between runs, do not panic immediately, but do classify the cause.

Benign causes

Floating-point ordering differences
Parallel optimization without fixed seeds
Library updates changing numerics

Serious causes

Non-deterministic data ordering
Lookahead leakage that changes when window boundaries shift (Look-ahead bias)

What to freeze

Data revision and timestamps
Random seeds for any optimization inside IS
Versions of Freqtrade, Python, and analysis libraries

How to debug differences systematically

If two runs disagree, diff the inputs first: candle files, pair list, fee table, and strategy config hash. Then diff the window schedule (same start/end, same step). Only after inputs match should you suspect numerical drift.

Small differences vs big differences

Tiny differences in the last decimal of a ratio are often benign. Large jumps in WFE, retention, or profit usually mean the pipeline changed, not that the market changed between yesterday and today.

Reproducibility checklist for teams

Write a one-page runbook: command lines, environment variables, pinned dependency versions, and where raw data lives. Reproducibility is how you convert debugging from art into engineering.

Container images and hashes

If you run walk-forward in Docker, record:

image digest
build arguments
mount paths for data volumes

This removes an entire class of "works on my machine" failures.

Deterministic parallelism

If you parallelize optimization, use a scheduler that preserves reproducible ordering of accepted candidates, or fix seeds per shard.

Race conditions in "pick the best" logic are more common than most teams admit.

Wall-clock leakage in window builders

If your code uses "today" to clip the last window, two runs on different calendar days will silently shift the schedule.

Prefer explicit end timestamps chosen up front, then freeze them in the run artifact.