Walk-forward analysis (WFA) is not a single function call. It is a pipeline: ingest bars, align timestamps, split history into rolling in-sample (IS) and out-of-sample (OOS) windows, run a strategy evaluator, aggregate metrics, and persist artifacts so you can audit decisions later.

This guide shows how to build that pipeline in TypeScript with a senior-engineering mindset: strict types, deterministic ordering, and tests that fail loudly when you leak future data.

What you are building

A minimal WFA pipeline has five stages:

Data layer - canonical candle schema, timezone rules, missing-bar policy.
Split engine - anchored or rolling windows, step size, embargo if you simulate fills on the boundary.
Evaluator - turns IS parameters (or a search space) into OOS trades or equity curves.
Metrics - WFE, OOS retention, drawdown, turnover, cost drag, stability checks.
Reporting - JSON artifacts + human-readable summary tables.

If any stage is fuzzy, your WFA will look scientific while silently encoding look-ahead bias.

Step 0: freeze definitions

Before code, write down:

Bar timeframe and whether you allow forward-filled candles.
Trade accounting - fees, slippage model, position sizing rule.
Parameter policy - what you optimize on IS only, and what is frozen on OOS.
Randomness - if hyperparameter search is stochastic, fix seeds per window.

These choices belong in a RunManifest object you persist next to results.

Step 1: canonical types

Start with boring types. Boring types prevent subtle bugs.

Rules:

Never mix local time and UTC in the same pipeline. Pick UTC end-to-end.
Validate strictly: high >= max(open, close) style checks catch bad vendor feeds early.

Step 2: build rolling windows

Rolling WFA is a loop with a cursor. Keep the logic pure so you can unit test it without running a strategy.

Production upgrades you will want next:

Warm-up bars for indicators so IS optimization does not start on uninitialized MACD/RSI state.
Embargo if your trade logic can use information near the IS/OOS boundary.
Liquidity filters applied identically on IS and OOS.

Step 3: isolate the evaluator behind an interface

Your strategy code will change. Your WFA harness should not.

Pipeline orchestration becomes:

Slice candles for IS range.
Run optimization (grid, Bayesian, whatever) - IS only.
Take best params (or a robustness rule: top-N average).
Slice candles for OOS range.
Evaluate once with frozen params.

Step 4: compute metrics that survive skepticism

At minimum, store both:

OOS headline metrics - return, max drawdown, profit factor.
Stability metrics - sensitivity to parameter jitter, degradation vs IS.

If you only store the best OOS window, you are optimizing again - just with extra steps.

Step 5: persistence and reproducibility

Write artifacts to disk as JSON Lines or SQLite. Each run should include:

input dataset hash
window spec
evaluator version
dependency versions (package-lock hash)
params per window
metrics per window

This is what lets you answer the question: "Why did this pass last month and fail today?"

Step 6: tests that catch look-ahead

Add three tests every serious pipeline needs:

Shuffle test - randomly permute OOS labels (not prices) and confirm your splitter still respects time order.
Leak test - assert OOS slice cannot access candle indices beyond oosEnd.
Boundary test - a strategy that should be impossible unless future data leaks (for example, "buy if next bar close is higher") must never show impossible OOS performance.

Operational checklist before you trust results

Costs are applied on both IS and OOS with the same model.
Corporate actions / contract rolls are handled or explicitly excluded.
You log the entire hyperparameter search budget (to reason about multiple testing).