Automated Trading Strategies: May 2022 Update - Volatility is King
Over the last 12 months our top strategies made over $2.8M based on NT8 backtest results
Volatility is great for day trading, but what’s good for day trading isn’t always good for automated trading strategies, especially when the simulation is limited, and the simulation will always be limited.
Let’s take a minute to digest what I’m saying here. Most traders fall short in the discipline department, which is why we seek the cover of a good automated system, but the discipline we seek comes with concessions. In particular, we must sacrifice our main advantage: our eyes and brain. All we can do is attempt to capture what our eyes and brain see within the strategy. And, even if we manage to duplicate what our eyes see to perfection, the ‘brain’ of our automated system is always a few steps behind. The natural conclusion is that the simulation will never be as good as the original. But, if all you have is a simulation, and you seek to create the holy grail of originals, the simulation must account for the difference between the two or it will be prone to inaccuracies. This is true for both the strategy, which simulates you as trader, and the backtest engine, which simulates the market. Volatility makes this task even more difficult. That is, volatility is indirectly correlated with simulation or backtest accuracy.
For example, one of our earliest observations regarding backtest accuracy was that we needed to manage the difference between historical and actual bar formation. These errors are most pronounced with complex bar formations like Renko and Range that are not dependent on time. So while I may like to use Renko and Range to make entry and exit decisions as a day trader, the simulation has a hard time recreating the bar formation for automation. Perhaps we shouldn’t be using either one.
Volatility also depends on the market you’re trading in and, like any measure of velocity, is largely a function of time. It might seem that the best way to reap the benefits of volatility without being eaten is to use a data series that is not held by the limitations of time. While this is helpful from a day trading perspective, it produces weaker signals if our backtest engine is unable to act on them precisely and/or instantaneously. So, the best way to optimize our hunt is to minimize the impact of volatility by using a minute-based data series across several market instruments.
Before sharing how we did this, as well as the performance update and the scoring system we created to help identify strategies that might be more prone to backtest inaccuracies, let’s take a few steps back to review how we got here.
We are on the hunt…
We are on the hunt for the holy grail of automated trade strategy. We define the holy grail of trade strategy as having the following attributes:
Profit factor greater than 3
Annual drawdown less than 3%
Annual return greater than 500%
Maximum daily low of -$1,000
Avg Daily profit greater than $1,000
Less than 5,000 trades annually
Greater than 253 trades annually
Every two months I like to look at how our top automated trading strategies are performing and update you on key findings, highlights, takeaways and what’s in the pipeline. This is our May 2022 update. You can view past updates here:
It’s been well over a year since we started this. With every update, the data grows richer and our observations more insightful.
We’ve learned a lot over the last two months. The biggest lesson we’ve learned is that the simulation is both friend and foe. In truth, we learned this lesson at the beginning of this journey and I discuss it in the post: We Found The Holy Grail of Automated Trade Strategy...Then We Lost It. It set off a host of alarms about the accuracy of backtests and the limitations of NT7 as a backtest engine.
We did a number of things to compensate including upgrading from NT7 to NT8. We also conducted a deep dive into historical bar calculation and determined that it was one of our biggest enemies in the hunt. We digested and adjusted for everything NT8 warned about in terms of backtest accuracy. We updated our metrics and even changed our process. You can read more about exactly what we did here.
After that exercise, we felt like we had a handle on things, particularly the notion of overfitting, which was ultimately a by-product of complexity and the optimization of indicator parameters.
Then, we decided to run some of our strategies on Collective2 and a virtual server located in Chicago. We started off with our best strategy: Strategy 10. It performed horribly. Once again, we halted the subscription to understand why there was such a large discrepancy between our #1 strategy from a backtest perspective and our live results. The issue, upon intense scrutiny, was negative slippage. Ultimately, we found the strategy to be off balance. That is, unlike other strategies that had both positive and negative slippage, Strategy 10 only had negative slippage. This is the case even when the strategy is reversed.
Then we slapped a limit order on both sides of the trade for Strategy 10, and the strategy improved, but still only managed to break even.
Lessons learned:
Some strategies are more prone to negative slippage than others.
Limit orders can help improve performance for strategies that are prone to negative slippage.
The only way to combat negative slippage is:
with a really high profit factor, perhaps even higher than 3.
by only focusing on strategies that are not prone to negative slippage
So which strategies are not prone to negative slippage? Strategies that:
make less than 5 trades per day and those trades must be spaced apart
have a high ‘average time in the market’; the higher the better
make more than $100 per trade
only use one data series, preferably minute or time based
Another quick test to see if a strategy is prone to negative slippage is something I shared with subscribers when we first started accounting for backtest risk. You’ll want to measure the difference between a backtest on standard and high order fill resolution. The higher the difference, the higher the backtest risk; if the difference is negative it is an indication that the strategy might be more prone to negative slippage.
Next, we created a scoring system based on these attributes/tests to tell us when certain strategies might be more prone to negative slippage. The scoring system is referred to as Backtest Risk and is the last column of every strategy in the performance chart below. The higher the backtest risk score, the more prone the strategy is to negative slippage. The best backtest risk score is 0, meaning that there is no backtest risk or risk of negative slippage. Each score was given based on an assessment of the attributes listed above. I've also added a column that provides the average time in the market. In general, the longer the trade is in the market, the lower the backtest risk. By extension, because strategies with a high number of trades tend to have lower trade times, strategies with a high number of trades also tend to be more prone to negative slippage.
Without further ado, here's the most recent update for May 2022 along with the backtest risk score in the last column:
What is this telling us about backtest risk?
Strategies 3, 10 and 47 have the highest backtest risk; Strategies 25, 33 and 46 have the lowest.
Again, a low backtest risk score doesn’t make the strategy bad or good, but it lets us know how much we can trust backtest results. A strategy can have a profit factor less than one, and have a low backtest risk score. It can also have a profit factor greater than 2, and have a high backtest risk score. Ideally, you want a high profit factor and a low backtest risk score.
This necessarily changes the front-end attributes of the hunt. I’ll discuss our new attributes in the conclusion. Meanwhile, let’s look at some of the differences from March to May 2022.
The March update set records. Across the board, new highs for max drawdown were set, which is a side effect of the volatility that entered the market and peaked from January to March 2022. We saw a few new max drawdown records hit in the March to May time frame, but not many. Additionally, almost every strategy made more trades, so the volatility continued, and the premium associated with a highly volatile market continued, but not all strategies performed better. Translation, we’re paying more to play the game without the associated reward. Strategy 47 and Strategy 48 are both aimed at creating more reward in volatile markets.
Now, let’s not forget the goal of this tracking process. We’re tracking for overall performance which we’re measuring by profit factor. Overall, we’re looking at a deterioration in profit factor from 1.27 to 1.16.
What’s happening?
We can’t blame this on any one factor. In general, the hunt is riddled with traps and dragons, but there are five main enemies in the dark and scary forest of automated trading:
Overfitting - overfitting is primarily due to optimizations based on indicator parameters. The best way to fight overfitting is to avoid optimizing strategies with indicator parameters.
Alpha decay - alpha decay is reflected in a deteriorating profit factor due to prolonged changes in market structure like market chop and volatility. Over the last year and a half we’ve discovered several best practices to reduce the risk of alpha decay.
Backtest risk - backtest risk is due to the limitation of the simulation or backtest engine. It’s also due to market liquidity or the availability of contracts in the market. The primary concern is negative slippage. Over the last year and a half, we’ve discovered a number of best practices that reduce backtest risk.
Market chop - market chop is a form of volatility and takes the shape of an interference pattern. Hunting in market chop is like hunting in the desert; it takes a special kind of deftness. Time-based strategies tend to eliminate the noise caused by market chop.
Elevated volatility - high volatility is caused by faster/erratic price movements, which increases backtest risk and possibly negative slippage. It also increases slippage in live trades. Time based strategies based on five or less contracts tends to reduce the impact/noise of volatility.
Clearly, it’s time for a change, especially with regard to the data series and front-end attributes used to find the holy grail. Like Renko, Range is a good visual data series. Personally, I use it for day trading, but historically the calculation presents a challenge for the backtest engine. It also lends itself to the requirement for quick action, which is deadly in automated trading. That is, the simulation’s strengths are not agility or speed, but discipline and calculated decision making. If we are going to use the simulation to find the holy grail, we must acknowledge this and adjust the hunt.
With that said, I believe the holy grail of automated trade strategy:
is not the product of optimization
has a data series based on time, 1 minute or higher
has an average time in the market for each trade of at least 5 minutes
has an average $ per trade greater than $100
has a backtest risk score less than 5
has a risk management system that uses cumulative net income as a way to determine contract quantity
never uses more than 5 contracts
closes at 4pm (EST) every day to avoid initial margin
uses a limit order on entry and exit
In addition to adjusting the front-end attributes for the holy grail, we also need to expand the instrument list. When we started this hunt it was with a very narrow set of variables. We were more concerned about comparing strategy performance than optimizing performance. We simply did not know enough to be strategic about how to optimize performance without overfitting so we focused on what we could control. Now, based on our observations about which strategies have the best backtest accuracy, we can optimize the performance chart for those attributes, i.e. average time in the market, that have been shown to provide the best results.
From a practical perspective, the strategic question is: Is there a way to adjust all of our strategies to fit at least some of the attributes above even if we have to change the market traded to do so? The answer is yes.
Most of our strategies are based on a 36 range data series. We reran all strategies for comparison to see how they performed with a 1, 5, 10, 15, 30, 45, 60 and 120 minute data series on the following futures contracts:
6B - British Pound (FX)
6E - Euro (FX)
CL - Crude Oil
ES - E-Mini S&P 500
FDAX - DAX
GC - Gold
NQ - E-Mini NASDAQ 100
RTY - E-Mini Russell 2000
YM - Mini Dow
ZB - 30 year Bond
ZN - 10Year Note
ZS - Soybeans
ZW - Wheat
We then selected the best performing strategies. These are the results:
What can I say about this new “time-based” portfolio? On the whole, backtest risk is much lower. Since we switched from range to minute, the average time in the market is longer. Likewise, profit factor is much higher as expected (at 1.40 up from 1.16) and drawdown is much lower (at 6.90% down from 12.12%). We expected profit factor to be higher because we decreased the frequency across the board. When you decrease the frequency of a profitable strategy’s trades, there’s a good chance you will also increase profit factor. Keep in mind, we cherry-picked the highest performing strategies so the real test will be the July 2022 update.
If we look at profit factor by average across strategies and instruments, GC (Strategy 47) is the highest performer, followed by ZN and ZB.
If we look at the profit factor by sum across strategies and instruments, FDAX is the highest performer, followed by NQ and ZB.
From a profit per trade perspective, FDAX also performs better.
What does this suggest? It suggests that we need to start looking at: GC, FDAX, ZB and ZN in addition to NQ to diversify the portfolio.
Next steps: We’ve got a much lower trade count with this new portfolio, but the use of multiple contracts gives us the ability to trade several contracts on the same account, which is exactly what we’re going to do. We’ve already started trading the following strategies on the virtual server and C2:
Summary
In the past two months we’ve done a lot. In addition to creating a new testing process, we’ve also created a backtest risk score. Both are aimed at improving and/or mitigating the risk of backtest inaccuracies. We also published:
Automated Trading Strategy #45
Strategy 45e made $317K based on a one year backtest period and a risk management system that uses 1 to 5 contracts. This is our first strategy to use more than one contract and it’s proven to be one of our most consistent strategies.Automated Trading Strategy #46
Strategy 46e made $202K based on trading 1 NQ contract on a 5 year backtest period. This is our only strategy to have a backtest risk score of 0. Instead of indicators, this strategy is based on event/business cycle studies from the NDX (NASDAQ-100 Index). The NDX is the underlying instrument of the NQ (E-mini Nasdaq futures) so whatever impacts the NDX will naturally impact the NQ.Automated Trading Strategy #47
Strategy 47 only made $7,555 based on a one year backtest period, but it is profitable 97% of the time and has a profit factor of 108. Strategy 47 fits five out of seven attributes on our holy grail list of attributes, and has the highest profit factor of any strategy we’ve ever published, but it has a high backtest risk score. Essentially, it’s a scalping strategy. I published it because even though it is prone to backtest inaccuracies, the profit factor is so high that it allows for it. This is also one of the most complex strategies we’ve ever published.Automated Trading Strategy #48
Strategy 48 has 409 trades and one of the lowest drawdowns we’ve seen at 1.63%. It has a return on max drawdown of 1,030%, which means it made 10.3x more than the drawdown. Remember, the drawdown isn’t the lowest the strategy has ever gone from its starting point, it’s the lowest the strategy has ever gone from its high. This is why we like to use the max drawdown as a proxy for the balance you need to have in the Sim account to trade the strategy. This is also why we use max drawdown as the basis for return.
Our next strategy is one of our best yet. Instead of naming what would be Strategy 49, “Strategy 49”, we’ve decided to name it Strategy 0. We refer to it as Strategy 0 because you can use this strategy with any strategy as long as it has a certain profit factor and makes a certain number of trades. We’re using the reliability of the strategy’s cash flows to make trade decisions rather than the reliability of market indicators, which are anything but. I’ll be publishing Strategy 0 on June 1.
Conclusion
"Doctor Noonian Soong, my friend, happens to have been Earth's foremost robotics scientist."
– Geordi La Forge to Data, 2364 ("Datalore")
My favorite show to watch as a teenager was Star Trek Next Generations. I would come home and watch two shows. This was pre-Internet so timing was important. I had to be in front of the TV by 4pm (with a toasted bagel and grape jelly in hand) to see both episodes.
My favorite episodes were about Data, the fifth attempt by Noonian Soong and his wife Julia to create an android with advanced artificial intelligence. Each android they created suffered from, and was driven by, a kind of simulation envy.
B-4, as the name implies, came before Data. He was the third attempt. Lore was the fourth attempt. Lore showed advanced cognitive abilities, but had flawed ethical programming. That is, the calculations for being moral were too difficult to duplicate. Data, after a learning period, was advanced at both cognition and ethics, but only at the expense of emotion. Finally, the machine attempted to create a machine—Data created his own child. Her name was Lol. Like her Uncle Lore, Lol was unable to deal with emotion and her positronic brain collapsed. In the end, Data finds the ultimate success in the acknowledgment of his own strengths and weaknesses.
The more we learn about the limitations of our hunt, the more we can define our search to fit those limitations. We can create a model that lacks emotion. And, while the simulation may also lack certain attributes, like the inability to calculate complex bar formations, this is the sacrifice that must be made for the discipline we crave. That’s what we’re doing with this new set of strategies. We’ve optimized our strategies to suit the advantages of the simulation within a predefined set of metrics that have been shown to produce better live results. As we begin to incorporate this new set of attributes into the formulation process, it should help to create our best strategies yet. I’m truly excited about what’s ahead.