vol. 02 · tier 02 // ch. 08 of 10 · intermediate course
Backtesting Properly
"There are lies, damned lies, and backtests."
- read
- ~5 min
- length
- 1,024 words
- position
- 08 of 10
8. Backtesting Properly
“There are lies, damned lies, and backtests.”
A backtest run carelessly will tell you anything you want to hear. Done correctly, it’s the most powerful tool a serious trader has.
What backtesting is — and isn’t
Is: A simulation of your rules on historical data to estimate edge, drawdown, and consistency.
Isn’t: A guarantee of future performance. Markets evolve; edges decay.
Goal: gain statistical confidence that your idea has positive expectancy before risking capital.
The 7 deadly sins of backtesting
1. Look-ahead bias
Using information that wasn’t available at the time of the trade.
Example: Filtering for “stocks that closed above 50 SMA” and trading them at the same day’s open. You’d need to know the close before the open. Trade the next day’s open.
2. Survivorship bias
Backtesting only on today’s universe ignores delisted/bankrupt stocks. Most “buy and hold the Nifty 50” backtests are inflated because today’s Nifty 50 is the survivors.
Fix: Use a point-in-time universe (the actual constituents on each historical date).
3. Overfitting / curve fitting
Tweaking parameters until the backtest looks beautiful. RSI(13), close > EMA(47), only Tuesdays — your “edge” is noise.
Fix:
- Use few parameters (≤ 3).
- Test on out-of-sample data (split your history).
- Use walk-forward analysis.
4. Ignoring costs
Brokerage, STT, slippage, taxes — all real. A strategy with 0.05% edge per trade dies after costs.
Fix: Add realistic transaction costs (≥ 0.1% round-trip for equity, more for low-liquidity stocks). Model slippage as 0.1–0.3% on market orders.
5. Position sizing fantasy
“What if I’d bought 1,000 shares?” — but you only had ₹2 lakh. Use realistic sizing tied to actual capital and risk %.
6. Selection bias
“My strategy works on Reliance, Infy, and HDFC Bank!” — yes, because you cherry-picked them after seeing the data.
Fix: Test on the entire liquid universe, not 5 favorites.
7. Cherry-picking timeframes
Picking the 3-year window where the strategy worked. Test across multiple regimes (2008–2009 bear, 2010–2014 sideways, 2014–2020 bull, 2020 crash, 2020–2024 recovery, 2024 sideways).
A proper backtest workflow
Step 1: Data
- Get clean, adjusted OHLCV data (split & dividend adjusted).
- Source: Yahoo Finance, NSE Bhavcopy, Kite Connect historical, paid (Truedata, GFDL).
- Validate: spot-check vs charts, look for missing days, weird zeros.
Step 2: Define rules unambiguously
Pseudocode it. Every condition explicit. No “use judgment here.”
# Entry
if (close > sma_50) and (close > sma_200) and (rsi_14 crosses above 50):
enter_long(at=next_open, size=risk_pct(0.01))
set_stop(at=lowest_low_5)
set_target(at=2 * (entry - stop) + entry)
# Exit
if price hits stop or target or 20 bars elapsed:
exit()
Step 3: Train / Test split
- In-sample (IS) — 70% of data. Develop and tune.
- Out-of-sample (OOS) — 30% of data. Test once. Don’t peek during development.
If OOS performance ≪ IS, you overfit. Throw it out.
Step 4: Walk-forward analysis
A more robust version of train/test:
Train Jan 2018 – Dec 2019 → Test Jan 2020 – Jun 2020
Train Jul 2018 – Jun 2020 → Test Jul 2020 – Dec 2020
Train Jan 2019 – Dec 2020 → Test Jan 2021 – Jun 2021
... etc
Concatenate the OOS test results. This is the closest simulation to live trading.
Step 5: Compute the right metrics
Don’t just look at total return.
| Metric | What it tells you | Good range |
|---|---|---|
| CAGR | Annualized return | > index + 5% |
| Max Drawdown (MDD) | Worst peak-to-trough | < 25% |
| Sharpe Ratio | Return per unit of volatility | > 1 (good), > 2 (great) |
| Sortino Ratio | Return per unit of downside vol | > 1.5 |
| Calmar Ratio | CAGR / MDD | > 1 |
| Win rate | % winning trades | Strategy-dependent |
| Avg win / Avg loss | Payoff ratio | > 1.5 for low win-rate systems |
| Expectancy / trade | (Win% × AvgWin) − (Loss% × AvgLoss) | > 0 after costs |
| Max consecutive losses | Worst losing streak | Affects psychology |
| # trades | Sample size | ≥ 100 (statistical validity) |
Step 6: Stress test
- Monte Carlo: Shuffle trade sequence 10,000 times. Distribution of outcomes.
- Parameter sensitivity: Vary key params ±20%. Strategy should still be profitable. If perf collapses → overfit.
- Regime breakdown: Plot equity curve by year and by market regime. Look for hidden weakness.
Step 7: Forward / paper test
A passing backtest earns the right to be paper-traded. Run it live (no real money) for 2–3 months. Compare to backtest expectations. If reality matches → consider going live with small capital.
Typical pitfalls in Indian markets
- Bonus / split adjustments — using unadjusted data wrecks long-term backtests.
- Circuit limits — a stock may “show” a price but be untradable (locked in upper/lower circuit).
- Holiday calendar — NSE-specific holidays.
- F&O rollovers — handle expiry and contract changes carefully.
- Brokerage tier changes — Indian brokerage went from %-based to ₹20 flat in 2015. Older backtests need realistic cost models.
Tools for backtesting
| Tool | Type | Notes |
|---|---|---|
| Backtrader (Python) | Event-driven | Mature, flexible, free |
| VectorBT (Python) | Vectorized | Very fast, good for parameter sweeps |
| Zipline (Python) | Event-driven | Originally Quantopian’s, now community |
| TradingView | Pine Script | Easy, but limited (no proper position sizing in basic) |
| Amibroker | AFL | Indian retail favorite, fast, paid |
| MetaTrader 5 | MQL | Forex/CFDs primarily |
| Custom (DuckDB + pandas) | DIY | Maximum control |
A sanity-check rule
If a strategy shows:
- 80% win rate
- 5+ Sharpe
- Drawdown < 5%
- 200% annualized return
…it is almost certainly overfit, look-ahead biased, or a bug. Real, tradeable edges look modest: 50–60% win rate, Sharpe 1–2, drawdown 15–25%, CAGR a respectable percent above the index.
Modesty in backtest results is a sign of honesty.