BacktestingJune 12, 2026 · 12 min read

Inside the Mercurio Backtest: How We Turn a Claim Into Evidence

A backtest is a hypothesis, not a result. Here is exactly how Mercurio is validated — the universe, the execution model, the regime filter, walk-forward testing, and the drawdowns we refuse to hide.

Anyone can produce a beautiful equity curve. Overfit a few parameters to the last two years, ignore slippage, model stop-losses as if they always fill at the exact price, and you can manufacture a chart that climbs to the moon. That chart is a claim, not a result. The entire purpose of Mercurio's backtesting discipline is to tell the difference.

This article walks through how we validate the strategy that Mercurio actually runs. Every number here comes from the same engine — you can read it, and run it, yourself. The full backtest engine lives in portfolio_backtest.py on GitHub.

The strategy under test

Mercurio runs a single edge: trend following on 1-hour bars. Entries require stacked exponential moving averages (12/48, 24/72, 48/144) confirming a direction, with an ADX floor confirming a trend actually exists. Exits use a Chandelier trailing stop (a multiple of Average True Range below the recent high) plus a catastrophe stop. Risk is capped at 1.5% of capital per trade. That is the whole strategy — deliberately simple, because complexity is where overfitting hides.

We test it on a universe of large, liquid US equities — names like MU, CRWD, PANW, ASML, and ARM — using $25,000 of paper capital, the same figure the live paper account is capped at.

The honest execution model

This is where most backtests quietly lie. Two assumptions make or break realism, and we model both pessimistically.

Slippage. Every entry and exit pays 0.15% per side — roughly 0.3% round-trip — a more honest figure for liquid names than the 0.1% many backtests assume.

Gap-through-stop fills. A stop-loss does not guarantee you exit at the stop price. When a stock gaps down overnight, you fill at the open, not the stop. Modeling this — filling at the open price across session breaks rather than at the stop — was the single most important realism fix we made. It costs roughly 32 percentage points of return over our test window compared to the fantasy version where stops always fill exactly. We keep it on.

Why this matters

The difference between a backtest that assumes perfect stop fills and one that models gaps is the difference between a strategy you can trust and one that will surprise you with real losses. The gap model alone turned several 'winning' configurations into losers.

The regime filter — and the bug that inflated our numbers

Mercurio only takes long positions when the broad market is in a confirmed uptrend: the S&P 500 must close above both its 50-day and 200-day moving averages for ten consecutive sessions. In bear or choppy regimes, the engine sits in cash.

While validating this, we found a bug that should make you suspicious of every backtest you have ever seen. Our engine had been computing the S&P's 50/200 'day' averages from 1-hour bars — roughly 7 and 28 trading days, five to seven times too fast. In a two-year bull market this barely shows; over a full cycle it is fatal, because the filter switches regimes far too eagerly. Fixing it to use genuine daily 50/200-day averages (mirroring the live bot) erased about 31 percentage points of phantom return from one multi-year test. We would rather find that ourselves than have the market find it for us.

Walk-forward, not curve-fitting

The cardinal sin of strategy development is tuning parameters until the backtest looks great, then being shocked when live results don't match. The defense is walk-forward analysis: optimize on an in-sample (IS) period, then test — untouched — on a later out-of-sample (OOS) period the optimizer never saw.

We ran exactly this. The in-sample period (2021–2024, including the 2022 bear market) and the out-of-sample period (2024–2026) were kept strictly separate. Most of the 'improvements' we tried — wider trailing stops, different exit widths, exposure caps — looked spectacular on one window and fell apart on the other. They were mirages. Only one structural change generalized across both windows:

+52.9%
In-sample return (incl. 2022 bear)
1.13
In-sample Sharpe
+24.1%
Out-of-sample return
0.58
Out-of-sample Sharpe

That validated configuration — long-only, with the 10-day bull confirmation — is what Mercurio runs. The in-sample Sharpe clears the project's 1.0 bar; the out-of-sample Sharpe of 0.58 honestly does not, which is one reason we are still in paper validation rather than live.

The two-year result

Replaying the live configuration over June 2024 to June 2026 with all of the realism above produces this equity curve. It is not a straight line — and that is the point.

$25,000 start$38,278
2024-06-012025-06-032026-05-31
+53.1%
Total return
1.03
Sharpe ratio
1.37
Profit factor
55.9%
Win rate
-29.3%
Max drawdown
381
Trades

The drawdown we don't hide

Notice that the curve falls before it rises. Through late 2024 the portfolio sank to about $18,175 — roughly 27% below its starting capital — and spent months underwater before recovering. The maximum peak-to-trough drawdown over the window was 29.3%. A real person watching $25,000 fall below $18,200 has to keep their nerve. Any backtest that shows you only the smooth final number is hiding the part that actually tests you.

We tried hard to engineer that drawdown away — exposure caps, a close-on-breach circuit breaker, volatility targeting. Each one either failed to help or made the out-of-sample result worse. The drawdown is a structural property of long-only momentum on a tech-heavy universe during sharp corrections. We document it rather than pretend we solved it.

The five-year reality

The number most bots will never show you

Over a full five-year cycle (2021–2026) that includes the 2022 bear market, the same strategy was net negative: about -23% total, a negative Sharpe, and a 52% drawdown. Trend following on this universe works in sustained bull markets and struggles everywhere else. The strong two-year window is real, but it is not the whole story.

This is why the regime filter exists, why we are long-only, and why we sit in cash for long stretches. It is also why Mercurio is in a 12-month paper trial instead of trading live capital. We are testing whether the validated configuration holds up going forward — not assuming it will.

Reproduce it yourself

None of this asks for your trust on faith. The engine, the strategy, the risk manager, and the regime logic are all on GitHub. The backtest is deterministic: same configuration, same window, same numbers.

If you change one realism assumption — turn off the gap model, speed up the regime average — you will see the numbers improve, and you will understand exactly why we don't.


Disclaimer. Every figure in this article is from a historical simulation using paper capital. It is not live trading, not a forecast, and not financial advice. Past performance does not guarantee future results. Trading involves substantial risk of loss.