Methodology
How backtest gates work
Every backtest you run is graded by a panel of statistical gates — pass/fail checks that ask one blunt question: is this strategy's edge real, or is it luck and overfitting? A high return or Sharpe ratio is easy to manufacture by accident; the gates exist to tell the difference.
Why a panel, not a single number
A backtest can look brilliant for boring reasons: it only tested today's surviving winners, it got lucky in one sector or one hot streak, or it's simply the best-looking of a hundred variations someone tried. Each gate attacks one of those specific ways a backtest can lie. The grading is deliberately conservative — built on walk-forward, out-of-sample testing, bootstrap confidence intervals instead of point estimates, and a Deflated Sharpe Ratiothat penalizes the search itself (Bailey & López de Prado). The aim is to fail an over-fit strategy before you risk money on it, not after.
Pass, fail, and “diagnostic”
Most gates are hard: a strategy must clear all of them to qualify for paper deployment. The two cost gates are marked diagnostic — because TradePolaris ships signals and you control your own broker, execution, and slippage, a cost-stress failure is informational rather than disqualifying. Some gates (turnover, deployed capital, power) simply don't apply to every strategy type and auto-pass when they're not relevant.
Is the edge real?
Significance tests: does the strategy actually have an edge, or could the results be luck?
After costs, the strategy earns more than it risks — and the edge is statistically real, not luck.
Annualized Sharpe of per-trade, post-cost returns. Passes when a moving-block bootstrap rejects “no edge” (Holm-Bonferroni corrected across the significance tests).
The edge still shows up on data and symbols the strategy was never tuned on — the strongest sign it isn't overfit.
The same bootstrap Sharpe test, but on out-of-sample (held-out) trades only.
The strategy wins often enough for its style, beating what you'd expect by chance for that kind of strategy.
Share of profitable trades vs a style-specific baseline (momentum 38%, mean-reversion / event 55%) via a one-sided binomial test.
The strategy beats simply buying the index — not just the cash rate.
Annualized Sharpe of the daily return in excess of the benchmark (SPY); bootstrap-tested.
Is it robust, or borderline?
Confidence floors and worst-case checks — the edge has to hold up at the pessimistic end of the error bars, not just on average.
Even the pessimistic end of the error bar on Sharpe is comfortably positive — the edge isn't borderline.
Lower bound of the bias-corrected (BCa) bootstrap confidence interval on net Sharpe must exceed 0.30.
The worst-case estimate of the out-of-sample edge is still clearly positive.
Lower bound of the BCa confidence interval on out-of-sample Sharpe must exceed 0.30.
For every $1 lost, the strategy reliably makes about $1.30+ — a real cushion, not breakeven.
Median profit factor (gross wins ÷ gross losses) across bootstrap resamples must exceed 1.3.
Even in a bad-luck scenario (the worst 5%), the deepest peak-to-trough loss stays within a quarter of your capital.
The 95th-percentile bootstrap maximum drawdown must be ≤ 25% of capital.
Is it overfit?
Anti-overfitting checks that penalize cherry-picking, one-off hot streaks, and thin samples.
After accounting for how many strategy variations were tried before this one was picked, there's at least a 90% chance the edge is genuine — not just the luckiest of many attempts.
Bailey & López de Prado's Deflated Sharpe Ratio — a Probabilistic Sharpe adjusted for skew, kurtosis, sample size and number of trials — must exceed 0.90. Computed on the per-trade return series.
The edge is consistent through time — it doesn't all come from one hot streak.
At least 80% of rolling 50-trade windows must have a positive Sharpe.
There are enough trades for the statistics to mean something — not a handful of lucky bets.
Trade count must meet a style minimum (mechanical 200, event 100, rare 30).
For rare-event strategies with few trades, checks there's still enough statistical signal to trust the result.
For event / rare strategies with fewer than 200 trades, Sharpe × √(trades) must exceed 2.0. Not applicable (auto-passes) for other strategies.
Did the backtest play fair?
Integrity of the simulation itself.
The backtest used point-in-time index membership — it didn't cheat by only testing today's surviving winners.
Passes when the run used historical (point-in-time) universe membership, avoiding survivorship bias.
Portfolio health
For multi-position strategies: is capital actually deployed, without excessive churn?
The strategy actually puts your money to work, rather than sitting in cash and overstating its risk-adjusted return.
Portfolio strategies only: the average fraction of capital deployed must be ≥ 60%. Auto-passes for single-name strategies.
The strategy doesn't churn the portfolio excessively — which would rack up costs and instability.
Portfolio strategies only: median per-rebalance turnover must be ≤ 0.30. Auto-passes for single-name strategies.
Cost stress — diagnostic
Because TradePolaris ships signals and you control your own execution and slippage, these gates inform but don't block deployment.
Even with noticeably worse trading costs, the edge survives.
Re-prices every trade with stressed (worse-than-default) slippage and re-tests the Sharpe.
Under worse-than-expected costs, even the pessimistic edge estimate stays above water.
Lower bound of the BCa confidence interval on the stress-cost Sharpe must exceed 0.0.
What a passing score means (and doesn't)
Clearing every hard gate means the strategy's historical edge survived a demanding statistical exam — it is far less likely to be a fluke. It does not guarantee future performance. All backtests are hypothetical, computed on historical data with hindsight, and past or simulated results are not indicative of future returns. See our full disclaimer.