The StudyJune 19, 2026 · 14 min read

The Honest Reckoning: We Tried to Beat the Market. Here Is What Actually Happened.

We built an AI trading bot, broke it ten different ways, tested it to destruction, and ran thirteen rounds of research looking for an edge. This is the unglamorous, fully-documented result -- including the part where the market won.

This is not a sales page. Mercurio is a study project -- an honest, fully-documented attempt to answer one question that almost nobody selling a trading bot will answer truthfully: can a disciplined, AI-augmented system actually beat the stock market? We spent months on it. We have the bugs, the failed experiments, the backtests, and the verdict. Here is all of it, including the parts that did not work.

The short version

We built it. We broke it ten ways and fixed each one. We tested it honestly. It did not beat the S&P 500 on a risk-adjusted basis. We then ran thirteen rounds of research across roughly fifty strategies looking for something that would -- and proved, on our own data, that nothing reliably does. That is the result. It is the same wall every honest quant hits, and it is worth far more than a fake win.

What we set out to build

Mercurio began as an AI-augmented, multi-strategy trend-following bot trading about a hundred liquid US stocks, gated by a market-regime filter, with an AI layer for sentiment and risk commentary and hard risk limits on every position. The thesis was reasonable and the engineering was serious. None of that turned out to be the hard part.

Ten ways we broke it (and what each one taught us)

Every system that touches money fails first. What separates an honest project from a dangerous one is whether those failures are hidden or written down. Ours are written down -- as a permanent failure log in the codebase. A sample:

A 'kill switch' that had zero effect: setting a losing strategy's allocation to zero did not actually stop it from trading. No test had ever proven the control worked.
All safety state lost on every restart: kill switches, loss-streak counters and the drawdown breaker reset to defaults on each deploy -- silently re-enabling strategies that had been shut off for poor performance.
A daily reset that un-killed bad strategies: the morning routine blindly re-enabled everything, undoing the previous day's protective shut-offs.
A deployment that nearly liquidated the whole book: a Postgres SSL setting the database driver silently rejected made the engine see an empty database, treat every real position as an orphan, and queue market sell orders for the entire portfolio. Caught and cancelled in time.
A single wrong operator (|| instead of ??) that pointed the live frontend at localhost and broke the dashboard for hours.
Position multipliers that compounded toward zero, and an intermarket multiplier that could secretly increase risk above the limit instead of only reducing it.

The one that still gives us chills

The empty-database deploy nearly market-sold every open position -- twice -- before a guard caught it. No real money was lost (we trade on paper), but it is the clearest reason this project treats the broker, not its own database, as the only source of truth, and why every risk control now has a test that proves it actually works.

Then we asked the only question that matters

Not 'is the bot profitable?' -- almost any long-biased strategy looks profitable in a bull market. The real question is harder: does it beat simply buying and holding the S&P 500? We re-ran the strategy with brutally realistic execution -- stops that gap through on overnight moves, real slippage, the true live position cap -- over five years. Then we put it next to the index.

+68.5%

Bot, 5yr (realistic)

+80.2%

S&P 500, same window

0.88 vs 0.79

Sharpe (bot vs index)

The bot's risk-adjusted return was a hair better than the index, but it made less money with a deeper drawdown -- and the index pays dividends the bot's price-only figure does not even count. Over a full cycle that includes the 2022 bear, earlier honest tests were net negative. The plain reading: it did not beat the market.

Can you even tell a winning trade from a losing one?

We thought the fix might be a smarter filter -- learn which trades win and skip the rest. So we captured, for over a thousand trades, every indicator the strategy saw at the moment of entry, and asked a model to separate winners from losers. The best single signal scored an AUC of 0.556 -- a coin flip. A model trained on 2021-2024 looked brilliant in-sample and added essentially nothing out-of-sample; its most confident picks actually lost more than average. The lesson is blunt: at the moment of entry, a losing trade is statistically indistinguishable from a winning one. You cannot filter out the losers without throwing away the winners.

Thirteen rounds looking for an edge

We did not stop there. We tested roughly fifty strategies across thirteen rounds of research, every one walk-forward validated so we could not fool ourselves: cross-sectional momentum, volatility targeting, regime-exit overlays, the overnight-return anomaly, multi-coin crypto trend, cross-asset managed-futures momentum, low-volatility selection, an all-weather risk-parity portfolio across stocks, bonds, gold and commodities. The pattern never broke:

Anything that beat the index on risk-adjusted return did so by taking less risk -- and made less money.
Anything that beat the index on raw return did so with leverage, concentration, or hindsight -- and either carried a far bigger drawdown or collapsed the moment it met data it had not seen.
Nothing beat the S&P 500 on both return and risk, out-of-sample, cleanly.

This is the efficient market, measured

Every apparent exception we found was a survivorship, leverage, or overfitting artifact that died out of sample. That is not a failure of effort -- it is the result. It is exactly why the '+100% a year AI bot' you see advertised is, overwhelmingly, fiction.

The one thing that did beat the market -- and the catch

Exactly one clean, documented approach beat the index on returns: hold a 3x S&P 500 fund while the market trades above its 200-day average, and sit in cash otherwise. Over five years it returned about +124.8% versus the index's +80.2%. But read the fine print, because we will not hide it: its risk-adjusted return (Sharpe 0.66) is actually worse than the index's (0.77), its worst drawdown was roughly 52% -- and a single overnight crash, which our test window happened not to contain, can make it far worse. No AI can prevent that, because a crash moves faster than any signal can react.

It beats the market on money, not on skill. The extra return is pure leverage, paid for with pain you have to be able to stomach. That is the honest trade -- not an edge.

What this study actually proved

Consistently beating the market is one of the hardest problems in finance. The best fund in history is closed to outsiders, runs on PhD-level infrastructure, and caps its own size. We did not crack it in a study project, and we are not going to pretend we did. What we did do is rarer and, we think, more useful: we measured honestly, documented every failure, and arrived at a true answer instead of a marketed one. For most people, a low-cost index fund at roughly 10-12% a year quietly beats the large majority of active traders -- which is precisely what our own data showed.

What Mercurio is now

A transparent record of the whole journey, plus the leveraged-index strategy above, run on paper capital only. No real money. No guarantees. No hype. The code, the failures, and the research are all here to inspect -- which is the entire point.

Keep reading

Engineering

How Mercurio Is Built: A Tour of the System That Places the Trades

Read →Backtesting

Inside the Mercurio Backtest: How We Turn a Claim Into Evidence

Read →

See the full backtest