KIPLOKS ROBUSTNESS ENGINE
We break strategies before real money does.

Q: What is Kiploks and what is it for?

Kiploks is an intelligent engine for analyzing trading strategies. We help traders and funds stress-test their trading ideas, measure the impact of slippage, and assess algorithm viability before going live.

Q: How is Kiploks different from backtest platforms like QuantConnect?

Kiploks is not a backtester. It evaluates existing backtest and walk-forward results for robustness and survivability. Platforms like QuantConnect help you build and run backtests; Kiploks tells you whether those results are likely to hold up in live trading (overfitting, parameter fragility, capital risk).

Q: What does the Kiploks Robustness Score mean?

The Kiploks Robustness Score is a 0-100 composite that reflects how well your strategy passes validation (walk-forward efficiency), risk (out-of-sample drawdowns), stability (parameter sensitivity), and execution (cost drag). A higher score means the strategy is more likely to survive real capital.

Q: What is walk-forward analysis?

Walk-forward analysis splits history into rolling in-sample and out-of-sample windows. You optimize on in-sample and evaluate on out-of-sample to check if the edge carries over. Kiploks uses WFA results to compute Walk-Forward Efficiency (WFE) and other robustness metrics.

Q: Why is the analysis free for now?

The project is in public beta. We believe quality analytics should be available to the community, and in return we rely on your feedback to make the system an industry standard.

Q: What limits apply during beta?

Currently all users have Beta-limit: 100 (Promotional) — 100 full analytical reports per month. Need more? If you are a professional researcher or developer, contact support and we will increase your limit individually.

Q: How accurate are the results?

The system uses complex nonlinear models to assess strategy degradation. However, as we are actively testing, technical errors or inaccuracies may occur. We constantly calibrate the engine but recommend using the data as a supporting tool, not as gospel.

Q: What responsibility does Kiploks have for my trading results?

Important: Kiploks provides analytical data "as is". We do not accept liability for any financial losses, lost profit, or trading errors arising from our reports. Trading on financial markets involves high risk. The decision to enter a trade or run an algorithm is always yours.

Q: How can I help the project?

The best help is your feedback. If you spot a bug, odd numbers in a report, or have an idea for a new feature — get in touch. We are building this tool for you.

Q: What integrations does Kiploks support?

We support Freqtrade: send backtest and Walk-Forward Analysis results to Kiploks for robustness stress-testing. Setup is Docker-native and requires no patches. See the Freqtrade integration page and the kiploks-freqtrade repository on GitHub (https://github.com/kiploks/kiploks-freqtrade).

Backtests show performance.
Kiploks shows survivability.

Can this strategy survive real capital?

Most tools show how good a strategy looks.

Kiploks shows how easily it breaks.

• robustness across time
• overfitting probability
• parameter fragility
• capital failure thresholds

Before money is deployed.

[View example analysis]

Kiploks is a strategy robustness engine for trading strategy testing and to stress test trading strategies before live deployment.

Unlike backtesting tools, it delivers algorithmic strategy risk analysis and trading bot robustness assessment: walk-forward efficiency, parameter fragility, and capital failure thresholds.

What Kiploks Actually Does

Kiploks is NOT:

• a signal generator

• an optimizer

• a backtest visualizer

Kiploks IS:

• Trading strategy testing for robustness to real markets

• Stress-testing algorithms before real capital

• Risk and stability assessment of strategies — not just a backtest

If a strategy fails here, it fails before real money.

Core Questions (this is where people nod)

Real trading strategy testing and stress test questions:

• Does this strategy survive out-of-sample?
• Which parameters are fragile?
• What market regimes break it?
• How much capital degrades the edge?
• Is performance driven by luck or structure?

Most platforms never answer these. Kiploks gives you algorithmic strategy risk analysis and robustness assessment.

Kiploks does.

Robustness analysis

Detect overfitting, fragility, and regime dependence.

• Walk-forward validation
• Parameter sensitivity
• Market regime robustness
• Monte Carlo stability

Decision engine

Explainable verdicts instead of metric dumps.

• ROBUST / CAUTION / DO NOT DEPLOY
• Confidence scoring
• Risk classification
• Actionable warnings

Scalable Research

Scale research without losing reproducibility.

• Parallel execution
• Result deduplication
• Cache reuse
• Audit-friendly outputs

Decision Engine Flow

INTEGRATION

[Freqtrade Integration for Kiploks analytics]

TESTS

• Out-of-sample decay
• Parameter instability
• Monte Carlo tail risk
• Regime failure detection
• Capital stress & capacity limits

VERDICT

ROBUST

CAUTION

DO NOT DEPLOY

Real examples of what Kiploks catches

Kiploks Analytics
at a glance

Analysis below is created from real backtest data. Mixed outcome - some strong metrics, some weak.

Final Verdict Summary

One screen answers the main question: launch the strategy now, wait, or drop it.

Verdict - ROBUST (solid), CAUTION (needs attention), or DO NOT DEPLOY (not ready)
Robustness score and chance of success - how reliable the strategy looks
Checklist - did it pass validation, risk and cost checks? Green means you can consider deploying
Short summary: what works, what fails, and what we recommend doing next
Optional What-If table - see how the verdict changes if you tweak key numbers

[Methodology: Final Verdict Summary]

FINAL VERDICT

DO NOT DEPLOY

Diagnostic case: Neutral / Incubate (© Kiploks)

Kiploks Robustness Score: 0 / 100Bayesian pass probability: 38%

One or more hard gates failed. DO NOT DEPLOY until blocking modules are fixed.

Deployment Gate

Validation Gates

Execution Buffer - Net Edge (Net Profit > 15 bps, period-level)(-32.51 bps vs 15 bps) ESTIMATED - Net edge below 15 bps or edge deficit after fees
Stability (WFE > 0.5)(1.26 vs 0.5)
Data Quality Guard (test period ≥ 2 years)(769 vs 730 days)

Statistical Confidence

Statistical Significance (t-Stat > 1.96)
t-Stat (OOS Edge) > 2.0 (same metric as above, stricter threshold)(-1.19 vs 2) - t-Stat below 2.0 (OOS edge not significant). Same metric as Statistical Significance; 2.0 is stricter than 1.96. Single OOS window - interpret with caution.

Critical Failures

Deployment is blocked because the following hard gate(s) failed: Execution Buffer - Net Edge (Net Profit > 15 bps, period-level).

Execution Note: Backtest execution settings were missing. The system applied a standard Safety Buffer (0.05% slippage, 0.1% fee). For Institutional Grade (AAA), provide exact exchange API fees and liquidity-based slippage. You can set slippage and commission in your backtest or integration config to use exact values and remove this note.

Operational Insight: Net edge is negative; no slippage headroom. Status: Edge Deficit.

Recommended Action

DO NOT DEPLOY. Address failing hard gate(s): Execution Buffer - Net Edge (Net Profit > 15 bps, period-level). Then re-run analysis.

Robustness score is 0 because a module blocks (e.g. Risk, Execution, or Stability). Potential score if unblocked: 3. Fix blocking modules first. Even unblocked, score remains in TRASH range (0-20) - no meaningful improvement.

Kiploks Robustness Score

One number from 0 to 100 that shows how solid the strategy is. It is built from four parts:

Validation (40%) - does the edge hold when we test it on data the strategy has not seen?
Risk (30%) - reward vs risk, how deep drawdowns go, how well the strategy recovers
Stability (20%) - small changes in settings do not break the strategy
Execution (10%) - after real costs (fees, slippage), the edge is still there

If any critical part fails, the overall score can drop to 0 - one weak link affects the whole result.

A decent score still leaves room to improve on stability and execution after costs; the higher the score, the more robust the strategy.

[Methodology: Kiploks Robustness Score]

ROBUSTNESS SCORE

Formula (multiplicative penalties)

0 / 100 (FAIL)

Blocked by Walk-Forward & OOS, Risk Profile, Execution Realism modules

Diagnosis

Execution score (5/100) is below the blocking threshold of 10. Edge does not survive 10 bps slippage - strategy may not be realizable in live conditions. Review transaction costs, reduce turnover, or improve edge.

░░░░░░░░░░░░░░░░░░░░

Methodology Note[?]

Breakdown (contributing factors)

Data Quality Guard

████████████

100

→ Adequate data period for full audit

Walk-Forward & OOS(40%)(blocking)

░░░░░░░░░░░░

→ BLOCKED

Risk Profile(30%)(blocking)

░░░░░░░░░░░░

→ BLOCKED

Parameter Stability(20%)

█████████░░░

→ Parameters stable across sensitivity tests

Execution Realism(10%)(blocking)

░░░░░░░░░░░░

→ BLOCKED (raw 5, threshold 10)

Data Quality Guard

Checks that the data and run used for the analysis are trustworthy before we score robustness.

Sampling - enough trades for reliable stats
Gap density / price integrity - when OHLCV is available, checks for gaps and price sanity
Verdict - PASS, FAIL, or REJECT; contribution to the overall score (e.g. 0-40 points)

If DQG fails or is rejected, the analysis may be blocked or flagged so you do not rely on weak data.

[Methodology: Data Quality Guard]

DATA QUALITY GUARD

Data Quality Guard (DQG)

Score: 100%PASSDQG Factor: 1.00Contribution: 40.0

Robust Net Edge (Safe Edge): No net profit - outlier check N/A

Module	Score	Verdict
Gap Density	100%	PASS
Outlier Influence	n/a	N/A
Look-Ahead Bias	100%	PASS
Spread/Liquidity	100%	PASS
Sampling & Over-fitting	100%	PASS
Price Integrity	100%	PASS

Walk-Forward Validation

We tune the strategy on one time window, then check how it performs on the next one - so we see if the edge is real or just overfit to past data.

Walk-forward efficiency - how much of the tuned performance carries over to the validation period
Consistency - what share of validation windows are profitable
Degradation - how much results drop from the tune period to the validation period
Failed windows - how many periods lose money; many failures suggest overfitting

Charts show how equity evolves in tune vs validation, and returns per period. Final verdict: PASS or FAIL, with a short explanation.

[Methodology: Walk-Forward Validation]

WALK-FORWARD VALIDATION

[Walk-Forward Validation - single source of truth for this block] Verdict rule: Failure rate > 30% leads to verdict FAIL. When verdict is FAIL and failure rate > 30%, grade is capped to BBB - RESEARCH ONLY and recommendation is set to research-only (no 'acceptable for allocation' wording). Walk-Forward Validation methodology (single source for this block): WFE (Efficiency) = median(OOS/IS) over windows with IS > 0 only (min 3 such windows; otherwise N/A). Consistency = share of windows with positive IS return that have positive OOS return; show as X% (Y/Z) when Z = positive-IS windows. Failed Windows = windows with validation return <= 0 or insufficient OOS trades; same rule for count and details. Performance Degradation = (mean OOS - mean IS) / |mean IS| aggregate over all windows (not per-window); when mean(IS) < 0, degradation = -(OOS Retention - 1). Value from full-precision period returns; hand calc from rounded display % may differ (e.g. -17% vs -18.9%). Good = OOS > 0 and OOS/IS > 0.7; Fragile = OOS > 0 and OOS/IS <= 0.7; Fail = OOS <= 0. WFE Advanced and other sub-scores are pre-verdict composites; overall verdict can still be FAIL. --- Definitions (same as tooltips) --- WFE (Efficiency): Median of per-window OOS/IS ratios over windows with IS > 0 only. Not mean over all windows. Requires at least 3 such windows. Value from full-precision period returns; hand calc from displayed % may give ~1.28 when report shows 1.26. Consistency: Share of windows with positive IS return that have positive OOS return. Denominator is only windows with IS > 0 (e.g. 67% = 2/3); different from Failed Windows (e.g. 4/6 = windows with OOS <= 0). Failed Windows: Windows where validation return <= 0 or insufficient OOS trades (below threshold). Same definition for count and Failed Windows Details list. In some failed windows OOS may be better than IS (smaller loss); classification is by OOS <= 0 only. Performance Degradation: Aggregate over all windows: (mean OOS - mean IS) / |mean IS|. When mean(IS) < 0: degradation = -(OOS Retention - 1). From full-precision period returns; hand calc from rounded % may differ (e.g. -17% vs -18.9%). Transfer % (Good windows): OOS return / IS return as percent, using same precision as displayed Opt/Val so hand calc matches (e.g. 2.3/1.8 = 128%). Only when both IS and OOS > 0. Can exceed 100%. Grade override: When verdict FAIL and failure rate > 30%, grade is capped to BBB - RESEARCH ONLY and recommendation set to research-only; WFE Advanced score is pre-verdict composite. WFE Advanced (why ROBUST vs FAIL): WFE Advanced = composite of WFE ratio (mean OOS/IS), trend alignment, and volatility consistency. It does not use failure rate or number of failed windows. Overall verdict does (e.g. failure rate > 30% → FAIL), so ROBUST here can appear together with verdict FAIL. Monte Carlo P(positive): From simulation (bootstrap/distribution), not the observed proportion of positive OOS windows. P(positive) can differ significantly from observed (e.g. 11% vs 33%); large gaps may warrant reviewing simulation parameters. Stress / Recovery: From drawdown-recovery model; not from failure rate. Can show RESILIENT even when verdict is FAIL. --- Reconciliation with hand calc (copy of tooltip content) --- Reconciliation with hand calc from displayed %: WFE uses full-precision period returns so report may show 1.26 while hand calc from displayed Opt/Val gives ~1.28. Performance Degradation uses full-precision so report may show -18.9% while hand calc from mean IS/OOS in % gives -17%. Transfer % is computed from same precision as displayed Opt/Val (1 decimal) so hand calc matches (e.g. 2.3/1.8 = 128%, 2.9/0.9 = 322%). --- Current report values (for verification) --- Verdict: FAIL. Explanation: Failure rate exceeds 30% - verdict forced to FAIL.. Failed Windows: 4/6 (failure rate 67%). Recommendation: Research only. Do not deploy to production without further validation. Performance Degradation: -18.9%.

Time Stability & Overfitting Control

Performance Transfer

In-Sample (IS) vs Out-of-Sample (OOS)

Walk-Forward Analysis — Continuous View

IS (In-Sample) + OOS (Out-of-Sample) equity on a single timeline

Total OOS Return

-2012.3%

OOS Win Rate

2 / 6

IS Avg Return

-163.9%

OOS Avg Return

-194.9%

Overfitting Score

MEDIUM

Equity Curve — IS vs OOS segments (continuous)

ISOOS GoodOOS FragileOOS Failed

OOS performance per period (bar)

ISOOS

WFE (Efficiency):

1.26

Consistency:

67% (2/3)

Performance Degradation:

-18.9%

Failed Windows:4 / 6

Consistency uses only windows with IS > 0. Failed Windows = windows with OOS ≤ 0 or insufficient OOS trades (in some, OOS may still be better than IS). Different denominators.

Overfitting Risk:MEDIUM (n/a)

When failure rate > 30%, overfitting risk may be understated; consider verdict and failure rate.

Professional WFA

Grade:BBB - RESEARCH ONLY(override: Verdict FAIL and failure rate > 30%; grade capped to BBB - RESEARCH ONLY.)

Research only. Do not deploy to production without further validation.

Pre-verdict module scores (composite; overall verdict and grade above)

WFE Advanced: ROBUST(score 94)(pre-verdict composite; overall verdict FAIL)

Overall verdict FAIL - do not rely on this score alone.

Regime: STABLE

Monte Carlo: DOUBTFUL(P(positive)=9%)

Stress: RESILIENT(recovery: HIGH)

Equity curve: WEAK

Window Breakdown

Inst Insights

Performance degradation (aggregate over all windows) is significant.

Failure rate exceeds 30% of windows.

Data Provenance

Source: research.results

Heavy status: loaded

Heavy ref: loaded

R2 public URL: loaded

Points per curve: 100

Total Trades per Period: Period 1: 10, Period 2: 7, Period 3: 8, Period 4: 10, Period 5: 8, Period 6: 10

Param Drift: n/a

Report generated at: 2026-02-26T14:31:07.293Z

Failed Windows Details

• Period 1: Validation return is non-positive

• Period 4: Validation return is non-positive

• Period 5: Validation return is non-positive

• Period 6: Validation return is non-positive

Failed Windows Context:

No regime data available

▶ Verdict: FAIL

Failure rate exceeds 30% - verdict forced to FAIL.

Benchmark Metrics

Summary of how well the strategy holds up when tested on data it was not tuned on.

Walk-forward efficiency - how much of the optimized edge carries over to the next period (min, typical, max)
Out-of-sample - average risk-adjusted returns, how often it stays profitable, chance of passing
Kill-switch - how many bad periods in a row before we flag a problem
Stability - how stable the edge is over time
Regimes - how the strategy behaves in different market conditions (trend, range, high volatility)

Benchmark Metrics Verdict: READY, INCUBATE, CAUTION, or REJECT, plus a short summary of what it means.

[Methodology: Benchmark Metrics]

BENCHMARK METRICS

[Benchmark methodology – included when copying for context] Benchmark methodology: OOS Retention = sum(OOS)/sum(IS) over all WFA windows. When both IS and OOS are negative, ratio >100% means OOS losses larger than IS (Relative Loss Magnitude); verdict n/a in Kill Switch. Relative Change = (mean OOS - mean IS)/|mean IS|; when mean(IS)<0, Relative Change = -(Retention - 1). WFE (median OOS/IS) uses only windows with IS>0; min/median/max and variance same N. At small N (e.g. 3 windows), median WFE and Dominance are low-signal. Strategy Max DD = full backtest; OOS Max DD = validation windows only. Alpha = geometric excess (1+Rs)/(1+Rb)-1. Sharpe = return/volatility; Calmar = return/max DD. OOS Dominance = share of windows where OOS > 90% of IS; Profitable Windows = share with OOS > 0. --- Definitions / tooltips --- OOS Sharpe [A]: Same metric as Avg OOS Sharpe (window-level); both use canonical window-level OOS Sharpe from WFA. WFE distribution: Median of per-window OOS/IS ratios over windows with IS>0 only. Min/median/max same N; median middle value (odd N) or average of two middle (even N). Variance is population. When overall OOS negative, median >1 means loss driven by subset of windows; large spread suggests outliers. OOS Retention / Relative Loss Magnitude: sum(OOS)/sum(IS). When both negative: label shows Relative Loss Magnitude;>100% = OOS losses larger than IS; not interpretable as profit retention. Kill Switch verdict n/a. Relative Change: (mean OOS - mean IS)/|mean IS|. When mean(IS)<0: Relative Change = -(Retention - 1). Same N as Retention. Avg OOS Sharpe: Mean of per-window OOS Sharpe. Sharpe uses volatility; Calmar uses max DD — they can diverge. OOS much worse than Full Sharpe may indicate overfitting. Avg OOS Calmar: return/|Max DD|. Different denominator than Sharpe; can be more negative when DD is large. Full Sharpe / Full Calmar: Full backtest. Sharpe and Calmar use different denominators (vol vs max DD). Profitable Windows: Share of WFA windows with OOS return> 0. Not the same as OOS Dominance (OOS > 90% of IS). OOS Dominance Ratio: Share of windows where validation return exceeded 90% of optimization return. Not the same as Profitable Windows (OOS > 0). Can be high even when total OOS negative if one window has extreme loss. Small N: At 3 or fewer valid windows (or 5 total), median WFE and Dominance are low-signal (noise rather than reliable signal). Max Drawdown: Strategy Max DD = full backtest period. OOS Max DD = validation windows only; can be smaller than full-period DD. Tracking Error: Volatility of (strategy - benchmark) returns. Same aligned return series as volatility and correlation. Alpha (Excess CAGR): Geometric excess (1+R_strategy)/(1+R_benchmark)-1. Same as Excess in CAGR row. Differs from arithmetic difference.

Quick Win (WFA summary)

OOS Sharpe: -0.60(same as below)OOS Calmar: -0.16OOS Max DD (validation only): -12.37%

WFE (Median OOS/IS): 1.26 (N=3 of 6, IS>0 only)Relative Loss Magnitude: 119.0%

Full Sharpe: -0.04Full Calmar: -0.38

Profitable Windows[?]2/6 all windows (33%) FAIL

Profitable OOS among IS>0 windows(same base as WFE)2/3

OOS/IS Trend MatchYES

Win Rate Change (OOS − IS, pp)[?]-5.4 pp RED FLAG

IS Win Rate / OOS Win Rate88.4% / 83.0%

IS: full backtest; OOS: OOS trades

Statistical Robustness (OOS Validation)

WFE (Median OOS/IS) distribution

min-4.00|median1.26|max3.10scale

WFE Variance9.036(population)(Consistency: Low)

Parameter Stability Index (PSI)[?]n/a[?]

Edge Half-Life (T1/2, OOS)[?]n/a[?]

WFA Windows6

Exp. OOS Return ± Vol[?]-1.9% ± 3.2%

Worst Window Return (N=6, 1 obs)[?]

-4.90%

Avg OOS Sharpe (window-level)[?]-0.60

Avg OOS Calmar (approx)[?]-0.16

Optimization Gain (IS)[?]1.86%

Relative Loss Magnitude (OOS/IS)[?]119.0%(N=6 windows)

Relative Change (OOS−IS)/|IS|(mean OOS vs mean IS)[?]-19.0%

Advanced Diagnostic Indicators

OOS Dominance Ratio (OOS > 90% of IS)[?]66.7%

Estimated pass probability[?]38%

Capital Kill Switch (OOS fail windows tolerated)3 consecutive (all windows) (limit: 1)

Capital Kill Switch checklist

CheckStatusThresholdCurrentRelative Loss Magnituden/a≥ 20%119.0%Net Edge Safety(market-relative)(i)FAIL≥ 10 bps-32.51 bpsRegime AdaptabilityFAIL≥ 1/3 pass0/3 passBayesian ConfidenceFAIL≥ 65%38%

Strategy Class FingerprintRegime-dependent

Regime Survival Matrix0/3 pass

Market Bias: Sideways - explains why some regime cells are n/a

Trendn/a

RangeFragile

HighVoln/a

Verdict[?]

🔴REJECT

Immediate Kill Switch triggered. Net Edge < 10 bps (current: -32.51 bps); Bayesian pass probability < 65% (current: 38%); Regime adaptability: 0/3 pass (min 1); Consecutive OOS drawdown windows: 3 (limit: 1)

Capital Kill Switch - The Red Line[?]

3 consecutive (all windows) (limit: 1)

Next OOS window in minus → turn off bot

Benchmark Comparison

How your strategy stacks up against BTC buy and hold: same period, same risk lens.

Growth rate (CAGR) - strategy vs benchmark
Alpha - how much extra return you get over the benchmark
Information ratio - excess return per unit of extra risk
Correlation to BTC - how closely the strategy tracks the benchmark (diversification vs clone)

[Methodology: Benchmark Comparison]

BENCHMARK COMPARISON

Relative performance, risk profile, and portfolio fit vs BTC Buy & Hold.

Executive Summary

MetricStrategyBTC Buy & HoldExcess / Comment

CAGR

-9.51%27.03%-28.76% (Alpha)

Alpha (Excess CAGR)(Same as Excess CAGR)

-28.76%n/aBenchmark-relative edge

Annualized Volatility

12.60%48.88%Risk dispersion (annualized)

Risk Profile

MetricStrategyBTC Buy & HoldExcess / Comment

Max Drawdown (full backtest)

25.04%50.08%Peak-to-trough decline over whole curve

Calmar Ratio

-0.380.54Drawdown-adjusted return

Skewness

n/a[?]0.00Tail asymmetry

Kurtosis

n/a[?]9.39Tail thickness

Portfolio Fit

MetricStrategyBTC Buy & HoldExcess / Comment

Beta to BTC

0.001.00Sensitivity to BTC

Correlation to BTC

0.021.00Diversification signal

Tracking Error

50.27%Relative volatility

Rolling 30D Correlation (Peak)

0.24Stress correlation

Operational Details

MetricStrategyExcess / Comment

Fees (est.)

0.11%Per trade assumption

Slippage (est.)

0.05%[?]Per trade assumption

Break-even Slippage

n/a[?]Max slippage before alpha = 0

Net Edge (per period, bps)

-32.51(Negative edge due to costs (Scalping hell))Mean excess per period minus costs

Alpha t-Stat

n/a[?]Significance check

Key Takeaway

• Insufficient data for alpha conclusion (t-Stat n/a or fewer than 30 period returns). Do not rely on alpha.

• Benchmark-relative alpha is negative (-28.76%).

• Risk profile: 12.60% annualized volatility.

• Strategy max drawdown 25.04% vs BTC 50.08%.

• Portfolio fit: beta 0.00, correlation 0.02.

• Tracking error 50.27%.

Parameter Sensitivity & Stability

For each strategy setting: how sensitive it is - can you nudge it a bit without killing performance, or is it fragile?

Table: best value, how far you can move it and still keep most of the edge, sensitivity, and a label - Stable, Reliable, Needs Tuning, or Fragile
Governance - rules per parameter (e.g. time decay, liquidity limits, bounds, or only in certain market regimes)
Diagnostics - stability over time, safety margin, how much performance drops when you move the param, signal strength

Parameter Sensitivity & Stability Deployment status: APPROVED, APPROVED (Conditional), REJECTED, or HOLD - plus overall parameter risk and which setting is the most sensitive.

[Methodology: Parameter Sensitivity & Stability]

PARAMETER SENSITIVITY & STABILITY

[Parameter Sensitivity & Stability - single source of truth for this block] Parameter Sensitivity & Stability: single source from backend (optimization trials or WFA-derived). Sensitivity (R²), 2 decimals. Risk Score: Base = 100×(1 − maxSensitivity). Penalty = 2×needsTuningCount + 5×fragileCount (2 per Needs Tuning [0.40, 0.60), 5 per Fragile >=0.60). Raw = Base − Penalty. Ceiling = 100 − 5×penalisedCount. Final = max(0, floor(min(Raw, ceiling))). The number shown is Final. Order: round sensitivity to 2 decimals, then band, then penalisedCount, then Base, Penalty, Raw, Ceiling, Final. penalisedCount = Needs Tuning + Fragile. Bands: Stable [0, 0.30), Reliable [0.30, 0.40), Needs Tuning [0.40, 0.60), Fragile >= 0.60. Risk Class: LOW if score >= 65, MODERATE if 50<=score<65, HIGH if 20<=score<50, CRITICAL if <20. Deployment: (1) Data Quality Guard and sufficient OOS trades - if failed, REJECT; (2) Performance Decay - if >= 80%, REJECT; (3) Parameter Risk Score - if< 50, REJECT; (4) otherwise by Risk Class. When Performance Decay is N/A (< 3 periods), Decay gate is skipped. Set HOLD_WHEN_DECAY_UNAVAILABLE to true to force HOLD when Decay N/A. Governance metrics (Sharpe Drift, Tail-Risk, Coupling) do not affect Risk Score or Deployment; advisory only. Scale: round sensitivity to 2 decimals, then band. Stable [0, 0.30); Reliable [0.30, 0.40); Needs Tuning [0.40, 0.60); Fragile >= 0.6. Boundaries: 0.30 = Reliable (start); 0.40 = Needs Tuning (start); 0.60 = Fragile (start). penalisedCount = params with rounded sensitivity >= 0.4. Penalty: 2 per Needs Tuning, 5 per Fragile. Ceiling = 100 − 5×penalisedCount. Final = max(0, floor(min(Raw, ceiling))). Order: round → band → penalisedCount → Base → Penalty → Raw → Ceiling → Final. Score: integer (floor). --- Definitions (same as tooltips) --- Local Topology: Shape of the score-vs-parameter curve (when curve points available). Sharp peak = fragile; flat = stable. PSI, Surface Gini, Safety Margin, OOS variance attribution. Not used in Risk Score formula. Sharpe Retention: (OOS Sharpe/IS Sharpe)×100. Computed only when IS Sharpe > 0. Not the same as Benchmark OOS Retention (return ratio). When > 100%, displayed as improvement (OOS > IS); descriptive only, no significance test. Sharpe Drift (OOS vs IS): Sharpe Drift = Sharpe Retention − 100. Report in p.p. only (e.g. 31.3 p.p.). Positive = OOS > IS (improvement); negative = degradation. Max Tail-Risk Reduction: (OOS CVaR − IS CVaR) / |IS CVaR| × 100 (%). Name 'Reduction' is historical; result can be negative (risk increased). Result > 0 = Risk Reduced; < 0 = Risk Increased. Relative change in tail risk (CVaR 95%). Governance Impact: Governance metrics: signal attenuation, Sharpe retention, Sharpe Drift, tail-risk reduction. Advisory only (manual review); not deployment gates. Multi-Parameter Coupling: When parameters move together (high correlation), risk can multiply. Coupling Risk and Max Correlation Pair show linked parameters. Not included in Risk Score formula. --- Current report values (for verification) --- Deployment: APPROVED. Risk Score: 71/100 (LOW). maxSensitivity: 0.29 (same as table). penalisedCount (rounded sensitivity>= 0.4): 0. Risk Score: integer (floor). Pro-Note: Highest sensitivity: Buy_rsi (0.29, Stable).

Methodology: Sensitivity = R² (correlation²) between parameter value and trial score; we use it as a proxy for 'outcome strongly tied to parameter' (tuning matters). High R² = parameter significantly predicts outcome. Magnitude (slope per unit change) is a separate planned metric; Risk Score does not use slope. Sensitivity values: 2 decimal places. Risk Score: integer (floor). From optimization trials or WFA windows.

Suggested Mitigation: Risk Neutral

Parameter

Optimal[?]

Topology[?]

Sensitivity

Status

Buy_rsi

0.29

🟢 Stable

Suggested Mitigation: Risk Neutral

Exit_short_rsi

0.06

🟢 Stable

Suggested Mitigation: Risk Neutral

Roi_p1

0.03

0.01

🟢 Stable

Suggested Mitigation: Risk Neutral

Roi_p2

0.07

0.01

🟢 Stable

Suggested Mitigation: Risk Neutral

Roi_p3

0.07

🟢 Stable

Suggested Mitigation: Risk Neutral

Roi_t1

🟢 Stable

Suggested Mitigation: Risk Neutral

Roi_t2

0.02

🟢 Stable

Suggested Mitigation: Risk Neutral

Roi_t3

0.01

🟢 Stable

Suggested Mitigation: Risk Neutral

Sell_rsi

0.01

🟢 Stable

Suggested Mitigation: Risk Neutral

Short_rsi

0.01

🟢 Stable

Suggested Mitigation: Risk Neutral

Stoploss

-0.30

0.01

🟢 Stable

Suggested Mitigation: Risk Neutral

Scale (classification bands): Scale: round sensitivity to 2 decimals, then band. Stable [0, 0.30); Reliable [0.30, 0.40); Needs Tuning [0.40, 0.60); Fragile >= 0.6. Boundaries: 0.30 = Reliable (start); 0.40 = Needs Tuning (start); 0.60 = Fragile (start). penalisedCount = params with rounded sensitivity>= 0.4. Penalty: 2 per Needs Tuning, 5 per Fragile. Ceiling = 100 − 5×penalisedCount. Final = max(0, floor(min(Raw, ceiling))). Order: round → band → penalisedCount → Base → Penalty → Raw → Ceiling → Final. Score: integer (floor).

Sensitivity (R²): strength of linear relationship between parameter value and score (predictability), not magnitude of effect. Slope (impact per unit change) is a separate planned metric; Risk Score uses R² only. From optimization trials or WFA windows.

Topology (when available): curve shape from trials; flat = stable, sharp peak = fragile.

Sensitivity: implemented as R² (correlation²). R² measures strength of linear relationship (predictability), not magnitude of change; we use it as proxy for parameter-outcome tie (high R² = fragility). For true sensitivity (magnitude per unit parameter), derivative-based metric is planned. Values: 2 decimals; Risk Score: integer (floor).

DIAGNOSTIC SUMMARY

1. Local Topology & Stability[?]

2. Governance Impact (Suggested Mitigation)[?]

Governance metrics below do not affect Risk Score or Deployment; advisory only.

Signal Attenuation: 17.0%

Sharpe Retention (IS ➔ OOS): [?]131.3%(OOS > IS; may indicate sample or regime bias - interpret with caution)

Sharpe Drift (OOS vs IS): [?]31.3 p.p.(OOS Sharpe > IS, improvement)

Max Tail-Risk Reduction: [?]40.2%(Risk Reduced)

3. Multi-Parameter Coupling[?]

Coupling analysis: No dominant unstable interactions detected.

AUDIT VERDICT

Deployment Status: APPROVED (no Decay check)Approval does not include the Decay gate (Decay N/A). Enable HOLD_WHEN_DECAY_UNAVAILABLE to require Decay before APPROVED.

Performance Decay: N/A (min 3 periods required for decay check). When N/A, Decay condition is omitted; HOLD_WHEN_DECAY_UNAVAILABLE (backend) can force HOLD instead of APPROVED.Warning: Decay N/A and Fail-Safe is off; deployment not based on decay.

Final Decision = (Risk Score Verdict) AND (Performance Decay < 80% when Decay is defined; when Decay is N/A this condition is omitted) AND (Min OOS Trades met). REJECTED when any applied condition fails. Performance Decay is a deployment gate (step 2). Governance (Sharpe Drift, Tail-Risk, etc.) is advisory only; does not change the result.

Risk Score: [?]Base 71 − Penalty 0 → Risk Class: LOW (71/100) (passing)

Pro-Note: Highest sensitivity: Buy_rsi (0.29, Stable).

Trading Intensity & Cost Drag

How often the strategy trades, what costs eat into returns, how much size it can handle, and how execution quality affects results.

Baseline: how much you trade per year, trades per month, typical holding time
Efficiency: profit before vs after costs, how much cost eats the edge, average net profit per trade
Break-even slippage: how much worse execution you can tolerate before the edge disappears, and what happens if you exceed it
Cost breakdown: fees, slippage, market impact, total drag on returns, any rebates
Capacity: how much capital the strategy can run before returns start to drop (e.g. -10%, -25%) or collapse; usage of daily volume
Slippage vs size: how net returns change as you scale up and slippage grows
Execution quality: order types, fill rates, opportunity cost, and risks from latency or toxic flow

[Methodology: Trading Intensity & Cost Drag]

TRADING INTENSITY & COST DRAG

[Trading Intensity & Cost Drag – included when copying for context] Trading Intensity & Cost Drag: Annual Turnover (institutional) = min(Purchases, Sales) / AUM, annualized (SEC/Morningstar style). Used for cost and rebate. Position velocity = utilization × (365.25 / avg holding days). Total Cost Drag = fees + slippage (+ market impact when in range). When participation > 15% ADV, market impact and capacity show N/A. Rebate Capture is informational. Cost/Edge and Safety Margin show n/a when gross edge ≤ 0. Required Alpha Boost: (1) When per-trade gross < 0: = |gross per trade| + cost per trade bps. (2) When period gross < 0 but per-trade gross> 0: = |Avg Net Profit / Trade| bps (execution offset only). Win Rate Sensitivity n/a when base profit ≤ 0. --- Definitions / tooltips --- annualTurnover: Institutional turnover = min(Purchases, Sales) / AUM, annualized (SEC/Morningstar style). Used for cost drag and rebate. Can be lower than position velocity when positions overlap. positionVelocity: utilization × (365.25 / avg holding period in days). Can overstate true capital turnover when multiple positions overlap; shown for reference. avgHoldingTime: Average time from open to close per round-trip (matched BUY-SELL pairs). From trade timestamps. Can imply multiple overlapping positions when combined with trades/month. avgPositionSize: Average trade notional as share of AUM (utilization). Institutional turnover = min(Buy,Sell)/AUM; position velocity = utilization × (365.25 / avg holding days). crossCheck: Cross-check: trades × utilization as sanity check vs institutional turnover. impliedOverlap: Ratio of position velocity to institutional turnover; >1 indicates overlapping positions. profitFactorGross: Profit Factor < 1 means total gross losses exceed total gross wins over the period; this is consistent with positive avg gross per trade when losing trades are larger or more frequent. costEdgeRatio: Ratio not reported when gross edge ≤ 0: it would be >100% and is not interpretable as an execution constraint. avgNetProfitPerTrade: Net = gross − cost per trade (bps of notional). Total Cost Drag is annual cost as % of AUM; cost per trade in bps can be larger when turnover > 1. Required Alpha Boost = |net| bps per trade to breakeven. safetyMarginNegative: Safety Margin is undefined when net edge ≤ 0 (breakeven slippage not defined). Shown as n/a to avoid implying a valid ratio. Safety Margin is based on average profit per trade; Net Edge is based on calendar returns and can include hidden costs. marketImpact: Estimated cost as % of initial balance per year (square-root model). Values > 100% mean cost exceeds capital per year; check ADV and volume assumptions. top5AdvUtil: ADV utilization requires candle/volume data for the traded pairs. Not available for this backtest. portfolioWeighted: Portfolio-weighted ADV requires candle/volume data. Not available for this backtest. alphaHalfLife: Alpha decay in time (e.g. from trades). Differs from Edge Half-Life (OOS window-to-window decay in Pro Metrics). Requires at least 30 trades; otherwise n/a. winRateSensitivity: When base profit ≤ 0, sensitivity is not meaningful: strategy is already loss-making. Metric is not interpretable as payoff structure. requiredAlphaBoost: Bps per trade needed to reach breakeven. Two cases: (1) Per-trade gross < 0: full amount = |gross edge per trade| + execution cost per trade. (2) Per-trade gross ≥ 0 but net < 0: execution offset only = |Avg Net Profit / Trade| bps. When status is EDGE DEFICIT and this value equals |Avg Net Profit / Trade|, period gross is negative but each trade is profitable before costs; strategy loses at calendar level. grossEdgePerTrade: Same cost base as Cost Drag and Required Alpha Boost (institutional turnover). When overlap is high, this value can be large because cost per trade uses the lower institutional denominator. grossEdgePeriod: Profit Factor < 1 means total gross losses exceed total gross wins over the period; this is consistent with positive avg gross per trade when losing trades are larger or more frequent.

Execution: Simple (estimated fees)

Results use estimated fees/slippage. Provide exact exchange parameters for Institutional-grade analysis.

Position velocity (holding-period) (121.1x) is 3.87x institutional turnover (31.3x). Overlapping positions likely; institutional turnover is used for cost and rebate.

Market Impact (Layer 2.5)

ADV $171.371 is very low; model assumptions may not hold.

INTERPRETIVE SUMMARY

Period gross is negative (profit factor < 1) but per-trade gross is positive. Strategy loses at calendar level; accumulated costs exceed profit. Required Alpha Boost is execution offset only.

Baseline AUM:$1,000

Avg Trades / Month:9

Annual Turnover (institutional):[?]31.3x

Position velocity (holding-period)[?]121.1x

Avg Holding Time:[?]44.4h

Avg position size[?]61.3% of AUM

Cross-check (trades × utilization)[?]~62.6x

Implied overlap factor[?]3.87x

EFFICIENCY & COST LIMITS

Profit Factor (Gross, full backtest):[?]0.84

Profit Factor (Net, full backtest):0.76

Cost / Edge Ratio:[?]n/a (negative gross edge)

Avg Net Profit / Trade (bps)[?]-14.42 bps

Break-even Slippage:

Tolerance:n/a (negative edge)

Margin of Safety:n/a

Safety Margin:[?]n/a (negative edge)

BES Status:EDGE DEFICIT

Failure Mode:Negative period Net Edge (cost > profit)

COST DECOMPOSITION (CAGR)

Exchange Fees:-12.5%

Slippage:-6.3%

Market Impact (est.)[?]N/A - participation ratio too high for model

Participation ratio exceeds 15% of ADV; square-root model out of range.

Total Cost Drag:-18.8%

Market impact not included (model out of range); total is fees + slippage only.

Rebate Capture:0.48 bps/trade(≈ 0.15% CAGR at current turnover)

Rebate Capture is not included in Total Cost Drag; informational (potential savings with maker-heavy execution).

When gross edge is negative, cost decomposition shows cost allocation; improving execution alone cannot make the strategy profitable.

CAPACITY & MARKET IMPACT

Estimated AUM Capacity:

N/A — model out of range (participation > 15% ADV)

ADV Utilization:

Top 5 traded pairs:n/a

Portfolio weighted:n/a

Market Impact Model:

Assumption:Square-root law

Liquidity regime:Micro / low liquidity

SLIPPAGE SENSITIVITY (NON-LINEAR)

AUM Size

Slippage CAGR

Net CAGR

~$100k[N/A]

N/A

~$1.0M[N/A]

N/A

~$5.0M[N/A]

N/A

~$10.0M[N/A]

N/A

EXECUTION HARDENING

Order Type Bias:Limit-biased

Taker / Maker Ratio:40 / 60

Limit Fill Probability:69.0%

Opportunity Cost (Fill Decay):4.80 bps

Adverse Selection (Cost):1.00 bps

Latency Sensitivity:Low

Toxic Flow Risk:Low

HFT adverse selection not detected.

SENSITIVITY TO ALPHA DECAY

Alpha Half-Life:[?]720.0 days

Long half-life from WFA validation; high-turnover strategies may have shorter effective decay in practice.

Win Rate Sensitivity:[?]n/a (not meaningful)

RISK & CONTROLS

Primary Constraint:Gross edge negative (alpha-deficit)

Gross edge (per trade, at institutional 31.3x):[?]+45.6 bps

High value reflects low institutional turnover denominator; most capital cost is in overlap periods.

Gross edge (period/CAGR):[?]negative

Available Control Levers:

Reduce trading frequencyLow

Increase entry threshold (signal strength)Low

Shift to maker-only executionLow

STATUS

Deployment Class:Micro-cap / Research-only

COST ADAPTABILITY:❌ FAIL (Required: +14.4 bps)

Required Alpha Boost (bps per trade)[?]14.42 bps

CAPACITY GOVERNANCE:⚠ SCALE-LIMITED

EXECUTION RISK:CONTROLLED

Confidence Level:Low

9 trades/mo, low signal-to-noise. Negative Z suggests loss-making; low confidence (small sample).

Z-Score: -1.54(negative = loss-making tendency; low confidence (small sample). Indicative only; statistical significance not claimed at this n.)

Strategy Action Plan

Concrete next steps based on the analysis: what to fix first, how much slippage you can afford, and whether the strategy is ready to deploy.

Action items - prioritised by impact (e.g. parameter tuning, execution, risk)
Slippage tolerance - safe vs dangerous levels
Deployment readiness - go / wait / fix before deploying

[Methodology: Strategy Action Plan]

STRATEGY ACTION PLAN

[Strategy Action Plan – included when copying for context] Strategy Action Plan: Institutional decision engine. Slippage Sensitivity: Net Sharpe at each slippage level uses a nonlinear model (power 0.7 in slippage vs reference bps); degradation is not linear. When base OOS Sharpe < 0, strategy is NOT_VIABLE and table is suppressed. Baseline Sharpe from WFA OOS (window-level) or Risk (OOS trades). Verdict bands: Safe when netSharpe >= baseSharpe*0.85; Margin erosion when >= 0.6*base or netSharpe > 0; UNTRADABLE when netSharpe <= 0. Drawdown Delta = relative increase vs baseline DD; cap at 200% means DD >= 3x baseline displays as +200%, actual may be worse. Phase 1 (Incubation): allocation 10-20%, monitoring focus on most sensitive param, Runtime Kill Switch = Armed (N/M OOS Fail) or TRIGGERED. Kill Switch reset: OOS Sharpe > 0 (2 consecutive windows), fail ratio < 33%, WFE (all windows) above Phase 2 threshold, manual review. Phase 2 trigger: at least 2 consecutive WFA windows with WFE (all windows) > threshold and Trend regime confirmed. Conflict A (Critical): fail ratio > 50% - no Phase 2. Conflict B (Warning): low WFE - conservative allocation. Bull/Bear: regime count 0/3 or X/3; fragile params; alpha decay. --- Definitions (same as tooltips) --- Slippage Sensitivity (formula): Estimated Net Sharpe at each slippage level uses a nonlinear model (power 0.7 in slippage vs reference bps); degradation is not linear in slippage. When slippage destroys edge, Net Sharpe can go strongly negative. Baseline Sharpe source: From WFA OOS (window-level) when avgOosSharpe is used; otherwise from Risk (OOS trades). Same source as precomputed slippage table. Drawdown Delta (vs baseline): Relative increase vs baseline max drawdown. Formula: DD_delta% = (DD_slippage - DD_baseline) / DD_baseline × 100. E.g. +108% means DD ≈ 2.08× baseline; +200% (cap) = 3× baseline — actual may be worse when capped. Verdict (Safe / Margin erosion / UNTRADABLE): Safe: netSharpe >= baseSharpe*0.85. Margin erosion: netSharpe >= baseSharpe*0.6 or netSharpe > 0. UNTRADABLE: netSharpe <= 0. Execution Efficiency (approx.): spread / breakeven tolerance (see Breakeven tolerance). Assumed spread: 2 bps major pair (BTC/ETH), 5 bps others, 3 bps if no symbol. If ratio >= 1, spread alone may erase edge. Breakeven tolerance: Min average P&L per trade (bps) at which strategy breaks even after costs. From OOS stats or gross edge per trade. Phase 2 trigger: At least 2 consecutive WFA windows with WFE > threshold (0.5-0.7 by base Sharpe) and Trend regime confirmed (Fragile to Stable). Conservative when Sharpe < 1. Condition is forward-looking; this report shows one WFE median across all windows. System conflict: Conflict A (Critical): fail ratio > 50% - use pessimistic scenario, no Phase 2. Conflict B (Warning): WFE below threshold - conservative allocation and extended monitoring. Runtime Kill Switch: Armed (N/M OOS Fail): N = count of windows with OOS return <= 0 or insufficient OOS trades, M = total windows. Fragile params: Parameters with sensitivity >= 0.6 (Fragile zone). Same threshold as Parameter Sensitivity block. Regime (0/3 or X/3): Count of regimes (Trend, Range, HighVol) that pass in regimeSurvivalMatrix. 0/3 = strategy not validated across market conditions. When matrix is not computed, regime validation shows as N/A.

Slippage Sensitivity Analysis

Strategy not viable — slippage sensitivity table suppressed (negative base Sharpe).

Baseline Sharpe: from WFA OOS (window-level).

WFE 1.26 (biased, n=3) / WFE 0.00 (all windows, n=6)

Equity erodes as slippage increases.

The Decision Engine

Phase 1: NOT VIABLE

Allocation: 0% - strategy not viable (negative base Sharpe). Do not allocate.
Monitoring: Observation without capital. Track Buy_rsi to detect when OOS Sharpe becomes positive.
Runtime Kill Switch: TRIGGERED

Kill Switch Reset Conditions (ALL must be met):

OOS Sharpe > 0 across minimum 2 consecutive windows
Fail ratio drops below 33%
WFE (all windows) above Phase 2 threshold for this strategy
Manual review by risk manager

Phase 2: UNAVAILABLE

Strategy did not pass Phase 1 (NOT VIABLE). Phase 2 conditions do not apply until base Sharpe is positive.

Conflict A (Critical)

Conflict A (Critical): Fail ratio > 50% (67% windows failed). Use pessimistic scenario; Phase 2 not applicable.

Why This Works

Bull Case

Bull Case: N/A - strategy not valid (negative base Sharpe).

Bear Case (Risks)

OOS retention may reflect a single regime; 0/3 regimes pass - strategy not validated across market conditions

Recommended Fixes

Statistical Significance: Extend test by +2 years or add instruments with low cross-correlation (ρ < 0.3) to generate independent observations. Correlated instruments share the same market regime and do not increase effective sample size.(High)

Risk Metrics (Out-of-Sample)

Risk measured on data the strategy was not tuned on: drawdowns, reward-to-risk ratios, and how stable the edge looks.

Key metrics: worst peak-to-trough drop, how fast it recovers, risk-adjusted returns (Sharpe, Sortino), profit factor, gain vs pain, win rate; also worst-case loss at 95% and expected shortfall in bad tails
Distribution: how reliable the edge is statistically, skew (asymmetry of returns), fat tails, and tail ratio
Narratives (optional): market context, tail risk profile, where risk comes from, stability over time

Verdict (e.g. CAUTIOUS PASS) with a split by performance, sample size, and tail risk. Recommendation: status, what to do next, max leverage. Risk assessment: STABLE, CAUTION, or UNSTABLE, plus a capacity estimate.

[Methodology: Risk Metrics (Out-of-Sample)]

RISK METRICS (OUT-OF-SAMPLE)

[Risk Metrics OUT-OF-SAMPLE - single source of truth for this block] Risk Metrics (OUT-OF-SAMPLE): computed from stitched OOS equity curve or window returns. VaR/CVaR from return distribution; PF and GtP from same series (GtP = Net profit / Gross loss, can be negative). Recovery Factor = total return / |max drawdown|. When CVaR is degenerate (single tail observation), CVaR is N/A and VaR should be treated as lower-bound only. --- Definitions (same as tooltips) --- Max Drawdown: Maximum peak-to-trough decline from OOS equity curve (cumulative product of 1+r). Stored negative in data; displayed as positive %. Recovery Factor: Total return divided by |max drawdown|. Same sign as total return; negative when strategy is net losing. Sharpe Ratio (OOS): Mean return / std return from OOS series. Same scale as Sortino in this block. Differs from window-level Avg OOS Sharpe in Pro Metrics. Sortino Ratio: Mean return / downside deviation. Requires at least 5 negative-return observations. When PF < 1, positive Sortino may be artifact - do not interpret as positive signal. VaR (95%): 5th percentile of return distribution (loss). When degenerate tail (single OOS window), treat as lower-bound estimate only. VaR when degenerate: Degenerate tail - single or collapsed tail observation. VaR should be treated as lower-bound estimate only. CVaR (ES): Expected Shortfall - mean of returns in worst 5% tail. Shown as N/A when degenerate (insufficient sample). Profit Factor: Gross profit / gross loss. Capped at 20. PF < 1 = strategy is unprofitable. Gain-to-Pain: Net profit / Gross loss. Can be negative when strategy loses. Equals PF - 1 when from same series. Trade Win Rate: Share of trades with profit > 0. From backtest (e.g. Freqtrade). Shown when payload provides backtest.results.winRate. Expectancy: E = WR×Payoff - (1-WR), in units of average loss. Displayed number is % of average loss per trade (not % of capital). Consistent with Win Rate and Payoff. Expectancy (of avg loss): Not % of capital. Value is % of average loss per trade. Period Win Rate (trades): OOS win rate: share of winning trades (from risk metrics). Not the same as Profitable Windows (share of windows with positive OOS return). Tail Ratio: 95th percentile / |5th percentile|. < 0.5 = left-tail dominant; > 2 = right-tail dominant. Null when p5 ≈ 0. Payoff Ratio: Average winning period / average losing period. Very low (< 0.2) means gains are small relative to losses. Payoff > 100: Capped at 100 (no or negligible losses). Edge Stability (t): t-statistic: mean return / (std/sqrt(n)) = Sharpe×sqrt(n). Same data as Sharpe; when single OOS window, t is computed but not statistically meaningful - interpret with caution. Skewness: Distribution asymmetry. Negative = left tail risk. Kurtosis: Excess kurtosis. > 0 = fatter tails than normal. Kurtosis winsorized: Winsorized (1% tails); raw kurtosis was > 50. Durbin-Watson: Residual autocorrelation. ≈ 2 = no autocorrelation; < 1.5 = overfitting risk. Not meaningful on small samples. Sortino when PF<1: Sortino is elevated due to low downside deviation on period-level returns; inconsistent with Profit Factor. Do not interpret as positive signal.

Out-of-sample risk metrics from Walk-Forward Analysis (stitched OOS equity curve or window returns).

Max Drawdown[?]36.61%

Recovery Factor[?]-0.90

Sharpe Ratio (OOS)[?]-0.16

Sortino Ratio[?]n/a[?]

VaR (95%)[?]-10.18%[!]

CVaR (ES)[?]n/a[?]

Profit Factor (OOS)[?]0.57

Gain-to-Pain[?]-0.43

Trade Win Rate[?]n/a[?]

Expectancy (loss units)[?]-7%(of avg loss)

Period Win Rate (trades)[?]83%

Tail Ratio[?]n/a[?]

Payoff Ratio[?]0.12

Edge Stability (t)[?]-1.19

Skewness[?]-0.04

Kurtosis[?]2.07

Durbin-Watson[?]n/a[?]

Diagnostic: Payoff Ratio is very low (avg win is only a fraction of avg loss); gains are small relative to losses. Negative Recovery Factor indicates the strategy has not recovered from max drawdown and is net negative.

Context: OOS metrics from 1 window (N=53 returns) (small sample - interpret with caution).

Regime Context: High drawdown (Max DD: 36.6%). Consider regime-dependent risk; do not infer volatility expectations without explicit volatility estimate.

Tail Risk Profile: Moderate excess kurtosis (2.07) - heavier tails than normal. Skew: -0.04. Strategy is unprofitable or high drawdown; tail distribution is not the primary concern. Tail Ratio may be unreliable on small sample.

Tail Authority: CVaR (ES) is unreliable - insufficient sample for robust ES estimation (e.g. single OOS window). VaR is reported; CVaR is not. In degenerate tail cases ES would equal VaR; we do not report a ratio when CVaR is unavailable. Both VaR and any reported tail metrics should be treated as lower-bound estimates only.

Risk Attribution: High period win rate (83%) with Profit Factor < 1: winning periods are outweighed by larger losses in losing periods. Strategy is unprofitable despite many winning periods. Payoff Ratio (0.12) is very low - gains are small relative to losses.

Risk Verdict: Insufficient data - OOS metrics from 1 window are not statistically meaningful. Collect more walk-forward windows before interpreting.

❌ UNSTABLEInsufficient data - single OOS window. Collect more walk-forward windows before interpreting.Max Leverage: 1x

What Makes Kiploks Different

Most tools optimize performance.

Kiploks protects capital.

Why professionals use it:

• Walk-Forward as a first-class metric
• Parameter fragility detection
• Failure scenarios, not just ratios
• Honest capacity limits
• Explainable decision logic

No black boxes.

No “trust us”.

Who This Is For

Built for

Strategy development, testing, and deployment.

• Systematic traders
• Quant developers
• Small funds & prop desks
• Capital-aware professionals
• Bot and backtest builders

Not built for

• Signal sellers
• One-off backtesters
• “100% ROI” seekers

Frequently asked questions

What is Kiploks and what is it for?

Kiploks is an intelligent engine for analyzing trading strategies. We help traders and funds stress-test their trading ideas, measure the impact of slippage, and assess algorithm viability before going live.

How is Kiploks different from backtest platforms like QuantConnect?

Kiploks is not a backtester. It evaluates existing backtest and walk-forward results for robustness and survivability. Platforms like QuantConnect help you build and run backtests; Kiploks tells you whether those results are likely to hold up in live trading (overfitting, parameter fragility, capital risk).

What does the Kiploks Robustness Score mean?

The Kiploks Robustness Score is a 0-100 composite that reflects how well your strategy passes validation (walk-forward efficiency), risk (out-of-sample drawdowns), stability (parameter sensitivity), and execution (cost drag). A higher score means the strategy is more likely to survive real capital.

What is walk-forward analysis?

Walk-forward analysis splits history into rolling in-sample and out-of-sample windows. You optimize on in-sample and evaluate on out-of-sample to check if the edge carries over. Kiploks uses WFA results to compute Walk-Forward Efficiency (WFE) and other robustness metrics.

Why is the analysis free for now?

The project is in public beta. We believe quality analytics should be available to the community, and in return we rely on your feedback to make the system an industry standard.

What limits apply during beta?

Currently all users have Beta-limit: 100 (Promotional) — 100 full analytical reports per month. Need more? If you are a professional researcher or developer, contact support and we will increase your limit individually.

How accurate are the results?

The system uses complex nonlinear models to assess strategy degradation. However, as we are actively testing, technical errors or inaccuracies may occur. We constantly calibrate the engine but recommend using the data as a supporting tool, not as gospel.

What responsibility does Kiploks have for my trading results?

Important: Kiploks provides analytical data "as is". We do not accept liability for any financial losses, lost profit, or trading errors arising from our reports.

Trading on financial markets involves high risk. The decision to enter a trade or run an algorithm is always yours.

How can I help the project?

The best help is your feedback. If you spot a bug, odd numbers in a report, or have an idea for a new feature — get in touch. We are building this tool for you.

What integrations does Kiploks support?

We support Freqtrade: send backtest and Walk-Forward Analysis results to Kiploks for robustness stress-testing. Setup is Docker-native and requires no patches. See the [Freqtrade integration] page and the [kiploks-freqtrade repository] on GitHub.

For advanced topics (slippage sensitivity, Kill Switch, NOT VIABLE verdict), see the [Methodology FAQ].

Stop trusting backtests.

[Start evaluating survivability]

KIPLOKS ROBUSTNESS ENGINE We break strategies before real money does.

Backtests show performance.Kiploks shows survivability.

Most tools show how good a strategy looks.

Kiploks shows how easily it breaks.

What Kiploks Actually Does

Kiploks is NOT:

Kiploks IS:

Core Questions (this is where people nod)

Robustness analysis

Decision engine

Scalable Research

Decision Engine Flow

Real examples of what Kiploks catches

Kiploks Analytics at a glance

Final Verdict Summary

FINAL VERDICT

Kiploks Robustness Score

ROBUSTNESS SCORE

Data Quality Guard

DATA QUALITY GUARD

Walk-Forward Validation

WALK-FORWARD VALIDATION

Walk-Forward Analysis — Continuous View

Benchmark Metrics

BENCHMARK METRICS

Benchmark Comparison

BENCHMARK COMPARISON

Parameter Sensitivity & Stability

PARAMETER SENSITIVITY & STABILITY

Trading Intensity & Cost Drag

TRADING INTENSITY & COST DRAG

Strategy Action Plan

STRATEGY ACTION PLAN

Risk Metrics (Out-of-Sample)

RISK METRICS (OUT-OF-SAMPLE)

What Makes Kiploks Different

Why professionals use it:

Who This Is For

Frequently asked questions

What is Kiploks and what is it for?

How is Kiploks different from backtest platforms like QuantConnect?

What does the Kiploks Robustness Score mean?

What is walk-forward analysis?

Why is the analysis free for now?

What limits apply during beta?

How accurate are the results?

What responsibility does Kiploks have for my trading results?

How can I help the project?

What integrations does Kiploks support?

KIPLOKS ROBUSTNESS ENGINE
We break strategies before real money does.

Backtests show performance.
Kiploks shows survivability.

Kiploks Analytics
at a glance