The Mudder Report 4.0: Forward Test Update & The Challenges of Working With Market Simulators?

The forward test has evolved; we’ve gone from ~35 strategy variations in the forward test to ~120.

Oct 03, 2023

∙ Paid

Important: There is no guarantee that these strategies will have the same performance in the future. I use backtests to compare historical strategy performance. Backtests are based on historical data, not real-time data so the results shared are hypothetical, not real. There are no guarantees that this performance will continue in the future. Trading futures is extremely risky. If you trade futures live, be prepared to lose your entire account. I recommend using these strategies in simulated trading until you/we find the holy grail of trade strategy.

We haven’t found the holy grail of automated trade strategy yet, but we get closer with every strategy. Click here for links to all strategy descriptions and the most recent performance chart from the backtest.

Edouard Manet 053, The Races at Longchamp, 1864, Art Institute of Chicago

The Mudder Report is Evolving

If I were to rank the reliability of strategy tests on the simulation it would be:

Live Forward Test - Live data using a live account. Provides the best data to use. No part of this test makes use of a simulation, however, you run the risk of losing your entire account, so the risk is great.
Simulated Forward Test - Live data using a dummy or simulated account. Provides the most accurate data you can use before going live. The account is simulated, so there’s still some degree of market risk, but there’s no risk of losing your account.
Market replay simulation - Provides greater accuracy than high-order resolution, but lower accuracy than the use of real-time data.
High-order resolution simulation - Provides less accuracy than market replay, but greater accuracy than the standard simulation.
Standard simulation - Provides the fastest results in exchange for lower accuracy. This is only an issue where detail is required. For example, a strategy that uses daily data is going to require less detail than a strategy that calculates bar data every 5 minutes. You still need to verify the trade matches the command.

For an overview on the difference between high-order resolution and standard simulation, click here.

Which one is best to use?

I think we can all agree that the best is always live, but it also puts your capital at risk. Second best is to conduct a simulated live forward test, but how do you decide which strategies to forward test?

Our process has evolved over time. Today ATS uses the standard simulation to find strategies for the simulated live forward test. You can read more about how to start your own forward test here.

In general, the more reliable the test, the more time it takes to run. That’s the conundrum. In other words, backtest risk decreases with accuracy. In the same way that the hunt for faster, low-latency data fueled the hunt for ML tools that can predict the future, the pinnacle of the simulation is to approach reality as much as possible, without having to trade live. This is why we’re running a forward test using live data on a dummy account.

Before sharing the results of the forward test, I want to talk a bit about backtest risk and the challenges of working with an equity simulator.

Working With Market Simulators? Verify, Verify, Verify

a couple of black cats sitting next to each other — Photo by Viktor Talashuk on Unsplash

Backtest risk is a catchall for anything that creates a divergence between the backtest and what actually happened or the backtest and what you expect to happen. This includes:

alpha decay/changing markets
inaccurate historical data
inaccurate data pulls
inaccurate bar calculation

I recently published Strategy 73. The results are based on a standard backtest, however, we’ve eliminated 75% of the backtest risk (nothing we can do about alpha decay). That includes correcting for an inaccurate data pull, which is what was happening in Strategy 46 (its predecessor). That doesn’t mean the strategy will perform the same in the future — there’s always risk, especially when trading futures — but that risk is not associated with the simulation.

So, the lesson of Strategy 73 (and its predecessor 46), is to always verify with trade data. It exposes a truth about working with market simulators that we should all be mindful of.

Here’s an interesting conversation between Peter Brown (Renaissance Technologies) and Raj Mahajan (Goldman Sachs)

The most relevant segment is around 10 minutes in. Brown recounts his early days with Renaissance Technologies. He says that after he studied the trading system he realized "the equity simulator made money, but the trading system lost money."

Sound familiar?

So Brown thought about ways to improve the code in the simulator.

He asked the coders: “How are you going to be sure that the trading algo produces the same answers as the simulator? “

The coders said, “That’s not a problem. We’ll just read over the code very carefully.”

“At that point”, according to Brown, “I realized what the problem was. You can’t verify if a computer program gets the right answer simply by reading it over.” So he and his partner rewrote the entire equity system.

If you were to ask the creators of NT8 the same question asked by Brown, they would most likely point you to the page: Discrepancies: Real-Time Vs. Backtest, which is a good reference point, but it’s primarily focused on explanations rather than fixes.

In the end, the best way to verify if an automated system gets the right answer is to analyze the trade data, not the code. In fact, this is the only way to verify.

To be clear, a backtest is based on historical data that is often flawed. The further back you go, the more detail is lost, and the more flawed it becomes. There is no way to get around that unless you have a team of people cleaning data and you have a large amount of memory for trade objects. The only way to get around the inherent challenges of the simulation is to use actual live data instead of historical data in a forward test. The results are still hypothetical, but you don’t have to worry about the simulation’s ability to store and recall historical data.

Now, let’s get into the forward test.

The Mudder Report 4.0: Forward Test Update & The Challenges of Working With Market Simulators?

The forward test has evolved; we’ve gone from ~35 strategy variations in the forward test to ~120.

The Mudder Report is Evolving

Which one is best to use?

Working With Market Simulators? Verify, Verify, Verify

This post is for paid subscribers