P-hacking trading strategy risk is why p-value in trading strategy validation must be read carefully. A p-value is not "probability the strategy works." It is a statement about surprise under a chosen null and modeling assumptions.

Why p-values break in trading

Multiple testing across many strategies destroys naive p-values (Data snooping)
Non-stationary markets violate tidy textbook assumptions

Better habits than worshipping one p-value

Pre-register tests
Pair with walk-forward evidence (WFA)
Use robustness views that do not collapse to one star rating (Robustness Score)

Permutation framing

Permutation p-value backtest thinking is related but still not a substitute for honest costs and data (Monte Carlo).

Effect size beats stars

Even a "significant" result can be economically meaningless after fees and slippage. Pair any statistical test with net performance, turnover, and stability across regimes. A tiny edge with huge variance is not a bankable strategy.

Confidence intervals and humility

Point estimates seduce; intervals humble. If your tooling can show uncertainty bands on metrics, use them. If it cannot, lean harder on walk-forward splits and out-of-sample discipline rather than one backtest p-value.

A concrete null hypothesis example (returns)

A common null is "mean daily return is zero" after subtracting a benchmark or after applying your cost model.

That can be useful as a sanity check, but it is still fragile if:

your sample has overlapping trades that break independence assumptions
your return definition changes between runs
your benchmark mismatch creates fake alpha

State the null in words a non-statistician can audit.

Why walk-forward beats a single p-value for deployment

A single-sample p-value is a snapshot. Deployment is a sequence of decisions under drift.

Walk-forward forces repeated OOS cuts, which is closer to the question capital actually cares about: does the process survive time-forward friction?