The Mudder Report: Backtest Audit Update 4/10/22 - 4/15/22 (Unlocked)

Apr 17, 2022

I’ve decided to make this post public because we’re at a crossroads and I want everyone to understand why that is and what we’ve done about it. May it help you in your own hunt for the holy grail of automated trade strategy.

At the end of all posts I ask for feedback/questions. Last week I got some rather negative feedback from one of the traders I work with (thank you Justin). In particular, I was told that the audit process is too complicated. The good news is that I’m old enough to value feedback, however it may come. I was also told by one of our subscribers that the backtest score is confusing. So, we’re going to streamline the backtest score process.

Before reviewing the changes, let’s review where we stand from last week.

Last Week

Last week we went in depth to show how slippage works with actual examples. Specifically, we debunked any notion that slippage is inherently bad. From a profit perspective it can be either good or bad. The only way you might think that slippage is bad is if your definition of “bad” is: any variance from the forecast (backtest), but this definition lacks nuance and can lead to missed opportunities. In actuality, the only strategies you need to be concerned about are those that are skewed toward negative slippage.

What if there was a way to carve out this sliver of bad pie? If there was a way to know which strategies were prone to negative slippage, we could understand the attributes that confer these results and steer away. I spent hours thinking about how to tackle this question more directly without conducting a full audit. Then I imagined that I had an answer. Right or wrong, it would likely only hinder future strategy development, so it’s six of one and half a dozen of the other.

What we do know is that slippage of any kind is especially prevalent when:

there are a high number of average daily trades per day
the average time in the market is very low (less than 15 minutes)
there’s a high degree of volatility
you are trading more than 2 to 3 contracts per trade
average trade net income is less than $20
there’s a high discrepancy between backtest results run on standard vs high order fill resolution
- Comparing both standard and high order fill resolution is a quick way to understand what the impact of bar formation has on the strategy.
- The image above is a screenshot of the historical fill settings available within the Strategy Analyzer on NT8. It can be set to high or standard. If there’s a large difference in results between the two, the strategy has a high backtest risk.
- Keep in mind that NT8 does not allow high order fill resolution for strategies with more than one data series (though it can be programmed in as we’ve done with our strategies). From a practical standpoint this means that strategies with only one data series are less prone to backtest risk, at least for NT8’s backtest engine.

We’ve also learned that slippage can include the loss or addition of an entire trade, which is what we found in Strategy 10 (more on this in a moment).

Finally, we’ve noticed that the results from the backtest and the simulated live test on the virtual server can be very different from the results of the virtual server and Collective2. That is, since the virtual server feeds Collective2, there should be little variance in timing. We expect variance in price, but not time. Why is this an issue? It means that the signals are firing off to Collective2 as they should and are in line with actuals, but the app that’s feeding the signals is not showing the same trades in the trade report. The most logical conclusion is that trades are happening live, but not registering on the virtual server for some reason. Again, we find ourselves in the weeds and we’re working with Collective2 to understand what the issue is, but I think it’s time to climb a mountain — we need to zoom out.

At the top of the mountain…

Indeed, this hunt has been an uphill battle, but we’re making strides. We’ve identified our main enemy — negative slippage. But, we’ve also developed a checklist for understanding which strategies are more prone to it than others. A strategy that is less prone to slippage of any kind:

makes less than 5 trades per day and those trades are spaced apart
has a high ‘average time in the market’; the higher the better
makes more than $100 per trade
has a small discrepancy between backtest results run on standard or high order fill resolution
only uses one data series

So, based on this knowledge, how is the process changing?

Each of us is going to give each strategy a score based on these attributes. I will take an average and provide it with the performance update next week. I think this will get us out of the weeds while preserving what we’ve learned about backtest risk. Ultimately, the goal is to give each strategy a score that can help to assess whether or not the backtest poses real risk. Instead of the score coming from an audit, it will come from an assessment for how well the strategy matches the attributes above. So, instead of eliminating strategies with a high backtest risk, we’re identifying them and submitting for additional testing.

Strategy 10

It is impossible to wrap this up without a discussion about Strategy 10. Strategy 10 started this audit process and I am truly grateful for its lessons.

Strategy 10 was our weekly poster child. It made a large number of trades, but was profitable every week for almost 20 weeks in a row in the backtest. When we ran the strategy live, however, it didn’t perform well. I’ve explained the issue at length in previous posts so I won’t go into it here, but in a nutshell, what we found is that the strategy was prone to negative slippage. Indeed, it had all the attributes of a strategy with a high degree of backtest risk.

What our preliminary audit has done is show us that slippage exists and for the most part it is in balance. When it’s not, we’ve been able to improve results by adding a 10 tick limit order to both sides of the trade. This reduces the number of trades taken, but it also improves backtest accuracy. We are also testing whether or not increasing the frequency of the data series, i.e. from 36 range to 72 range, or the type of data series, i.e. from range to minute, might help.

So what happened when we used a limit order on Strategy 10? When we used a limit order, it performed much better, but the net profit was still off by a large amount. I went through every trade made last week and compared the live results with the backtest. What I found was more than slippage, it was a complete miss. That is, about 10% of the trades were skipped altogether. And, 60% of the missed trades were the result of an exit trade. Furthermore, they all had a negative impact on the live test. Why was this happening?

The issue is that we placed a limit order on the front-end of each trade, but not on the back-end. That is, we placed a limit order on entry, but not on exit. This week we’ll see if placing a limit on the exit trade will close the gap.

Summary

So we have a new process. It’s easier to understand, but it doesn’t compromise the integrity of our work or the goal of the backtest audit. Again, I want to thank those of you that reached out. Feedback is critical in this effort and these changes would not have happened otherwise.

Next week we’ve got a lot on deck:

We’re live testing Strategy 10 with a limit order on the front and back end. We’re also live testing Strategy 10 using various data series.
We’re live testing Strategy 47. It’s one of our most interesting/promising strategies to date, but it has a high backtest score so we’re testing it before we publish. Either way, we’ll publish the results to subscribers on Friday.
We’re working on an updated performance chart with a backtest score for all strategies. The goal is to have everyone’s feedback by Thursday and to assimilate that feedback into the chart by Friday.

We will continue to test all strategies with a backtest score of 3 or more on the virtual server and possibly Collective2. I know the process has been a bit painful for some, but I truly believe it will help us to leapfrog others on the hunt. The hope is to be back on track and fully devoted to strategy creation by this time next week.

The Week Ahead

From a daily trading perspective, this week will likely be much slower. Look for the kind of interference pattern in market structure we had two weeks ago in equities. I won’t be in the market this week — but if I do trade it will be on Wednesday and Thursday. We’ve got minor announcements on Wednesday and a lot of speakers on Thursday, but no major rate announcements. Still, there’s a lot of scared money out there so central banking speakers will have more impact than normal. And, with cuts in central bank spending/intervention, I wouldn’t be surprised if the market takes a dive next week. Remember, we still don’t know what the fallout will be from trying to contract the market while maintaining liquidity. Volatility is one thing, but what comes next?

The release of Strategy 47 was pushed to April 22 for testing. All subscriptions have been extended by one week to compensate. If you have any questions or suggestions, please reach out by responding to this post or emailing at automatedtradingstrategies@substack.com.

Trade well

There is no guarantee that our strategies will have the same performance in the future. Some may perform worse and some may perform better. We use backtests to compare historical strategy performance, but there are no guarantees that this performance will continue in the future. Trading futures is extremely risky. If you trade futures live, be prepared to lose your entire account. We recommend using our strategies in simulated trading until you/we find the holy grail of trade strategy.