How to Read Benchmark Metrics

The Benchmark Metrics block summarizes Walk-Forward validation outcomes used for the benchmark verdict and Kill Switch (see [Benchmark Analytics] in methodology).

It answers: Do OOS and period-level metrics support a positive benchmark verdict?

[A] OOS-only metrics

Out-of-sample equity-based metrics (e.g. OOS Sharpe, OOS Calmar, OOS Max Drawdown) from the stitched validation curve. These reflect pure validation performance without in-sample data.

[B] WFA period-level (WFE, Retention, Profitable Windows)

WFE (Walk-Forward Efficiency) is the median ratio of OOS return to IS return across windows (where IS > 0). It measures how well in-sample edge transfers to out-of-sample.

WFE > 0.5 is typically required for a pass; higher values indicate stronger transfer.
Low or negative WFE suggests overfitting or unstable edge.

OOS Retention and Profitable Windows summarize how many windows retain positive performance and how many are profitable. These feed into the overall verdict and Kill Switch.

[C] Full backtest context

Full-backtest Sharpe and Calmar over the entire period. Used as context alongside OOS and period-level metrics. Kill Switch and verdict rely primarily on [A] and [B]; [C] provides full-period reference.

Pass Probability & Verdict

Bayesian pass probability estimates the likelihood that the strategy would pass validation if re-run. The benchmark verdict (e.g. PASS / FAIL) is derived from WFE, retention, and other thresholds and can block deployment when metrics are insufficient.

Regime buckets (Trend, Range, HighVol)

When available, metrics are broken down by regime (Trend, Range, High Volatility). This helps assess whether the strategy holds up across different market environments or depends on a single regime.

See also [Guide: Benchmark Comparison] for Alpha, Beta, Net Edge vs BTC Buy & Hold.

Methodology - formulas, glossary, and FAQ.