How Can You Tell If An Automated Trading Strategy Will Perform Well Over Time?

How To Test For Robustness

We receive many types of questions. Most are variations on the same theme — ‘how do I know if your strategies will perform the same or better in the future?’

This is basically what one of our readers asked us recently. Here’s the question:

Came across ATS and it's great to see others on the same quest (Godspeed, and great work). Curious: what robustness checks do you run, in evaluating your strategies? If one subscribes and pays the non-trivial monthly fee, are there results for validation measures, like walk forward analysis, Monte Carlo Simulations, Parameter Sensitivity, etc?

First, we appreciate the question. We empathize because it’s a question we’re trying to answer ourselves. The goal of this project is to take the best attributes from the best strategies (best = strategies that perform the same or better than backtest results in the future) and build on them in an effort to discover the holy grail of automated trade strategy. We are sharing the best strategies we find along the way with our subscribers for myriad reasons, but chief among them is to attract others on the same quest.

Second, and this is most important, testing a strategy for robustness is not the same as running it through validation measures like Walk-forward Analysis, Monte Carlo Simulations and Parameter Sensitivity. These validation methods are primarily used to optimize different variables or parameters within the strategy to make the strategy perform at its optimal level over a specific time period — due to the performance of our strategies, many people assume that our strategies are optimized, but they are not. For a list of all strategies, click here. Only the strategies marked in yellow are optimized.

We’ve marked them in yellow because we’ve found them to be weaker strategies. That is, optimized strategies based on parameters, inputs and chart type don’t hold up over time. As much fun as it is to optimize a strategy and test the limits of our validation tool set, it feels like intellectual masturbation at this point because we’ve seen no value in the outcome. So we’ve stopped optimizing strategies. Indeed, we believe it is the parameter/input/chart type optimization process that makes strategies weak.

While validation measures could help with the optimization of more fundamental variables like start date and asset class, they would only confirm what you can determine just as easily with a visual and a few data points. And, in the end, there is no substitute for time, which is why we provide updates every two months. Still, let’s review what the ‘validation measures’ listed in the question above are and how they pertain to automated trading strategies in general.

Monte Carlo simulation

Monte Carlo simulation allows us to optimize a strategy based on a series of random performance results. While random data is considered a strength in data collection, there is nothing random about trade data. You can read more about this in the post: What Can Auction Mechanics & Market Structure Tell Us About Trade Strategy? In it, we explain that trade data follows a certain pattern because it’s based on a live auction. This is actually the case for most financial data. In other words, Monte Carlo Simulation generally results in a poor forecast because it oversimplifies the data. As a result, Monte Carlo Simulation isn’t always helpful and can lead you down the wrong path if you’re optimizing parameters, chart types and inputs. In general, we’ve noticed that optimizing on these variables tends to make strategies weaker (read: future performance is inconsistent with backtest results), whereas optimizing on more functional or foundational variables, like when you start the strategy or what asset you trade the strategy on, makes the strategy stronger. The former is building with toothpicks, the latter is building with tree logs.

But, you don’t need a randomized validation process to optimize on fundamental variables either. For example, as mentioned above, when you start a strategy is incredibly important to the performance of strategies. It is not unusual for some strategies, like Strategy 33, to start off so well that they never fall below $0. This is due to the timing or start date more than anything else.

Automated Trading Strategies: Strategy #33 (Subscriber only)

Automated Trading Strategies: Strategy #33 (Public version)

The following strategies have a cumulative net profit that is greater than $0. That means the account value never dropped below $0 for the entire backtest period.

Click on chart to enlarge

As you can see, Strategy 33 has a drawdown of -$18K even though the account value never falls below $5,150. What this confirms is that when you start a strategy can impact its performance. You can also confirm this by looking at the cumulative net profit. This is the annual cumulative net profit chart for Strategy 33:

It starts off well and then pulls back, but never falls below $0. If we were to start the strategy at the beginning of the pullback, the account value would drop below $0, but the strategy itself would perform the same from that point on. In other words, you don’t need to run a random simulation to know there’s a great deal of sensitivity around the timing of your entry, and if a Monte Carlo Simulation helps you to determine the best time to start the strategy, it’s only going to confirm what you can easily determine visually by looking at a chart.

This is also the reason we started providing charts that look at how net income is distributed by day of week and hour of day. It allows you to determine visually what you would from a random validation on start date. Even then, you’d need a heat map to group the data into something meaningful.

Here’s the distribution of income by hour of day and day of week for Strategy 33:

Hour of Day

Day of Week

Based on this, the best time to enter Strategy 33 is on a Wednesday during the more volatile hours of the day (EST).

Multi-Objective Optimization

Another validation measure that we looked at is Multi-Objective optimization, which allows you to analyze multiple objectives (profit factor, max drawdown) and optimize based on the ideal trade-off. Currently, we maximize the backtest based on profit factor, but there are many options, including:

The conundrum of the hunt is the trade-off. You might be able to find a strategy with a profit factor greater than three, but does it have enough trades for the backtest to hold?

It is important to remember that we are not saying that our strategies are robust -- we are tracking them along with you — it is an evolutionary process. The first step in tracking is simply developing a strategy that is profitable. Then you test to see if it holds that profitability over time. Please also remember that our goal is to find the holy grail of trade strategy, which is defined as having the following attributes:

  • Profit factor greater than 3

  • Annual drawdown less than 3%

  • Annual return on max drawdown greater than 500%

  • Maximum cumulative daily low of -$1,000

  • Avg Daily profit greater than $1,000

  • Less than 5,000 trades annually

  • More than 253 trades annually

Our list of criteria continues to evolve as we track the performance of our strategies over time. Within the criteria to define the holy grail of trade strategy is also a test for robustness. As an example, on our last update we noticed that the lower the trade count, the more likely it is that strategy performance will deteriorate over time so we updated the criteria to include “More than 253 trades annually”. If a strategy has over 253 trades a year (one trade per day), we consider it to be a ‘good’ strategy. It is more ‘robust’. We use the performance of our best strategies as a guidepost, cherry-picking the best attributes to build on our research.

Walk Forward Optimization

Walk Forward optimization is when you use two sets of historical data to optimize and then test a strategy. It is essentially a validation process that splits historical data into two parts: one part is used for optimizing, the other for testing the optimization. The process is then run through forward iterations in time. So if your optimization period is 10 days and your test period is 15 days, it means a strategy will be optimized for 10 days and then the optimized parameters for that strategy will be applied to the following 15 days. This is by far the best validation tool for evaluating optimized strategies.

Validation tools like walk-forward optimization are being used as a way to evaluate results without the wait, but there is no substitute for time. Still, the most important point to be made is that based on our research, which does evaluate performance over time, optimized strategies do not perform like backtest results. That is, the backtest results are over-fitted to historical data so the optimized parameters are too narrow and only work in one particular point in time. We’ve found this to be the case for optimized strategies using walk-forward optimization with an optimization period of 10 days and a test period of 15 days, as well as with an optimization period of 1 day and a test period of 1 day. Either way, the walk-forward optimization proved unsuccessful. Both cases led to a strategy with a lower profit factor. The only optimization that produced better results was the traditional one year optimization that tests parameter sensitivity, but again, this leads to overfitting, which weakens the strategy. You can read more about what we’ve done to limit over-fitting in the posts: Over-fitting: What is it and what can we do about it and What Are We Doing To Ensure Backtest Accuracy?

From this perspective, it makes sense that over-fitting is a common issue for those on the hunt, but it is also a trap. Folks that are attracted to automated trading strategies are often highly analytical. We get off from seeing how much you could have made under perfect conditions, but the holy grail of trade strategy can’t have the same parameters in one point in time as it does in another. By its very nature, the holy grail of trade strategy must cast a wide net. It must mirror the thing it hunts, which is flexible and unconstrained. Optimizing the parameters within a strategy would be like expecting to hunt in the forest under the same conditions every time — this will never be the case. The only thing that’s certain about the nature of the hunt is that it will rarely be the same. But how long does it take to change?

We already know that traditional one year optimizations don’t work. We also know that short-term walk-forward optimizations don’t work. Perhaps we need to be even more granular. The question is: Is there a way to create a strategy that continues to automatically optimize the parameters of a strategy at the close of every bar for the last ten periods, i.e. ten minutes on a one minute chart, and then apply those new settings to the next bar?

There is only one indicator we know of on the market that does this. That is, instead of optimizing a strategy in one year backtest results, and then using those results as a way to place parameters on the future, we can optimize a strategy as close to the event as possible. In this way, the optimization is still based on backtest results, but the data used in the backtest is based on intra-day data. We tried this indicator out for ourselves and had some interesting results. Subscribers can click here for an overview of what we found.

Now back to the original question:

How Do We Know Our Strategies Will Perform Well Over Time?

We all want to believe that there’s a way to measure the forward strength of a model, but as we’ve explained above, validation measures are used primarily to optimize a strategy rather than test its strength. Market data is not random, but based on auction mechanics. The same tools that we use to validate and test the quality of data in a random data set can’t always be applied to the market. The best you can do is make predictions about the future based on strategy performance and then track that performance, as well as the rationale for your predictions, over time. This is why we publish updates every two months to track the progress of strategies over time. You can view these updates here:

The next update will be in November.

Another way to test for robustness is by looking at how well the strategy performs over different chart types, i.e. range, minute, tick, etc, and asset classes, i.e. stocks, futures, bonds, crypto, etc. (for an overview of how our strategies work with cryptocurrencies click here).

For example, Strategy 33 performs well across almost all futures contracts and chart types. The following chart shows the performance stats of Strategy 33 for 12 other futures contracts. We were shocked by the performance of the RTY (Russell 2000 Index) futures contract.

(click on chart to enlarge)

That said, every trader knows that markets have different personalities. That’s why most traders familiarize themselves with only one or two markets. For example, a slower market (low price velocity) is going to require a shorter data series than a faster market. And, we’ve found that while adjusting parameters to meet the specific needs of a time period leads to over-fitting, adjusting a strategy to meet the specific needs of a particular market makes the strategy stronger. This is why we focus on the NQ (NASDAQ) futures contract. When we find the holy grail of NQ futures contracts, we’ll move to ES (S&P 500) futures contract and so on.

In January, we’re also going to publish our first annual report that compares the performance of our first nine strategies over the last year. We’ll look at extremes throughout the year to see how well they performed and we’ll see if our predictions, which are based on performance data, i.e. profit factor and drawdown, held up.

Ultimately, the best way to test the robustness of a strategy is to see how well it performs over time. There is no substitute. What we’ve learned over the last 10 months is that:

  • optimized strategies based on parameters, inputs and chart types weakens a strategy

  • optimized strategies based on fundamental attributes like start date and asset class can add to the strength of a strategy

  • the higher the trade count, the more robust the strategy is

  • strategies that perform well across multiple chart types and asset classes tend to be more robust

Strength can also be found in the profit factor, which is why we are hunting for a strategy with a profit factor of 3 or greater. A profit factor of 3 provides a cushion should the strategy encounter hard times.

Something else we can do from an analysis perspective is provide updated results for a rolling 12-month period (what we do now) as well as a continuous backtest that starts on January 1, 2020 (the beginning of this project) and ends on the current month. We will be sure to add the latter in the annual report.

One thing we are looking at in the future is the Al Generate Optimization. The AI Generate optimizer is an experimental tool within Ninjatrader 8 that uses a Genetic Algorithm to help traders find new strategy approaches. It can combine up to 73 NinjaTrader default indicators, 25 Candlestick patterns, and single series custom indicators. It does this by searching through potential entry and exit combinations to find the best performers according to the Max Strength optimization criterion. We are interested in this because the randomized variable is the command itself, not the data that triggers the command. In this case, a Monte Carlo Simulation makes sense.

Please let us know if you have any other ideas.

Trade Well