Methodology

How backtest gates work

Every backtest you run is graded by a panel of statistical gates — pass/fail checks that ask one blunt question: is this strategy's edge real, or is it luck and overfitting? A high return or Sharpe ratio is easy to manufacture by accident; the gates exist to tell the difference.

Why a panel, not a single number

A backtest can look brilliant for boring reasons: it only tested today's surviving winners, it got lucky in one sector or one hot streak, or it's simply the best-looking of a hundred variations someone tried. Each gate attacks one of those specific ways a backtest can lie. The grading is deliberately conservative — built on walk-forward, out-of-sample testing, bootstrap confidence intervals instead of point estimates, and a Deflated Sharpe Ratiothat penalizes the search itself (Bailey & López de Prado). The aim is to fail an over-fit strategy before you risk money on it, not after.

Pass, fail, and “diagnostic”

Most gates are hard: a strategy must clear all of them to qualify for paper deployment. The two cost gates are marked diagnostic — because TradePolaris ships signals and you control your own broker, execution, and slippage, a cost-stress failure is informational rather than disqualifying. Some gates (turnover, deployed capital, power) simply don't apply to every strategy type and auto-pass when they're not relevant.

Is the edge real?

Significance tests: does the strategy actually have an edge, or could the results be luck?

Net Sharpenet_sharpe

After costs, the strategy earns more than it risks — and the edge is statistically real, not luck.

Annualized Sharpe of per-trade, post-cost returns. Passes when a moving-block bootstrap rejects “no edge” (Holm-Bonferroni corrected across the significance tests).

Out-of-sample Sharpeoos_sharpe

The edge still shows up on data and symbols the strategy was never tuned on — the strongest sign it isn't overfit.

The same bootstrap Sharpe test, but on out-of-sample (held-out) trades only.

Hit ratehit_rate

The strategy wins often enough for its style, beating what you'd expect by chance for that kind of strategy.

Share of profitable trades vs a style-specific baseline (momentum 38%, mean-reversion / event 55%) via a one-sided binomial test.

Benchmark information ratiobenchmark_ir

The strategy beats simply buying the index — not just the cash rate.

Annualized Sharpe of the daily return in excess of the benchmark (SPY); bootstrap-tested.

Is it robust, or borderline?

Confidence floors and worst-case checks — the edge has to hold up at the pessimistic end of the error bars, not just on average.

Sharpe lower-CI floorsharpe_lower_ci_floor

Even the pessimistic end of the error bar on Sharpe is comfortably positive — the edge isn't borderline.

Lower bound of the bias-corrected (BCa) bootstrap confidence interval on net Sharpe must exceed 0.30.

OOS Sharpe lower-CI flooroos_sharpe_lower_ci_floor

The worst-case estimate of the out-of-sample edge is still clearly positive.

Lower bound of the BCa confidence interval on out-of-sample Sharpe must exceed 0.30.

Median profit factormedian_bootstrap_profit_factor

For every $1 lost, the strategy reliably makes about $1.30+ — a real cushion, not breakeven.

Median profit factor (gross wins ÷ gross losses) across bootstrap resamples must exceed 1.3.

p95 max drawdownp95_bootstrap_max_drawdown

Even in a bad-luck scenario (the worst 5%), the deepest peak-to-trough loss stays within a quarter of your capital.

The 95th-percentile bootstrap maximum drawdown must be ≤ 25% of capital.

Is it overfit?

Anti-overfitting checks that penalize cherry-picking, one-off hot streaks, and thin samples.

Deflated Sharpedeflated_sharpe

After accounting for how many strategy variations were tried before this one was picked, there's at least a 90% chance the edge is genuine — not just the luckiest of many attempts.

Bailey & López de Prado's Deflated Sharpe Ratio — a Probabilistic Sharpe adjusted for skew, kurtosis, sample size and number of trials — must exceed 0.90. Computed on the per-trade return series.

Trade-block stabilitytrade_block_stability

The edge is consistent through time — it doesn't all come from one hot streak.

At least 80% of rolling 50-trade windows must have a positive Sharpe.

Sample sizesample_size

There are enough trades for the statistics to mean something — not a handful of lucky bets.

Trade count must meet a style minimum (mechanical 200, event 100, rare 30).

Power feasibilitypower_feasibility

For rare-event strategies with few trades, checks there's still enough statistical signal to trust the result.

For event / rare strategies with fewer than 200 trades, Sharpe × √(trades) must exceed 2.0. Not applicable (auto-passes) for other strategies.

Did the backtest play fair?

Integrity of the simulation itself.

Universe integrityuniverse_mode

The backtest used point-in-time index membership — it didn't cheat by only testing today's surviving winners.

Passes when the run used historical (point-in-time) universe membership, avoiding survivorship bias.

Portfolio health

For multi-position strategies: is capital actually deployed, without excessive churn?

Avg deployed capitalavg_deployed_capital_fraction

The strategy actually puts your money to work, rather than sitting in cash and overstating its risk-adjusted return.

Portfolio strategies only: the average fraction of capital deployed must be ≥ 60%. Auto-passes for single-name strategies.

Annualized turnoverannualized_turnover

The strategy doesn't churn the portfolio excessively — which would rack up costs and instability.

Portfolio strategies only: median per-rebalance turnover must be ≤ 0.30. Auto-passes for single-name strategies.

Cost stress — diagnostic

Because TradePolaris ships signals and you control your own execution and slippage, these gates inform but don't block deployment.

Cost-sensitivity Sharpediagnosticcost_sensitivity_sharpe

Even with noticeably worse trading costs, the edge survives.

Re-prices every trade with stressed (worse-than-default) slippage and re-tests the Sharpe.

Cost-stress Sharpe lower-CI floordiagnosticcost_stress_sharpe_lower_ci_floor

Under worse-than-expected costs, even the pessimistic edge estimate stays above water.

Lower bound of the BCa confidence interval on the stress-cost Sharpe must exceed 0.0.

What a passing score means (and doesn't)

Clearing every hard gate means the strategy's historical edge survived a demanding statistical exam — it is far less likely to be a fluke. It does not guarantee future performance. All backtests are hypothetical, computed on historical data with hindsight, and past or simulated results are not indicative of future returns. See our full disclaimer.

← Back to your strategies