Overfitting: What is it and what can we do about it
First, happy reconstitution day! It is often the heaviest trading volume day of the year and this year it’s happening today. Get ready for a wild ride this afternoon. Click here to read more about what reconstitution is and how it impacts trading.
Second, there’s a mechanical elephant in the room and it’s called “overfitting”.
We aren’t programmers, statisticians or mathematicians in any way. So, we have to translate many of these terms into something that makes sense to us from a trading perspective.
In a nutshell, the definition of overfitting is: Good performance on backtested data, bad performance on future data. Specifically, it’s when a strategy learns the “noise” that’s specific to the backtest period, but that noise does not apply to future data.
So, the question is: what is noise in trading and how do you know the difference between a strategy that has been “overfitted” to the backtest data, but won’t perform well in the future and a strategy that performs well in backtests as well as in the future?
This is what we’re hoping to answer with this post.
Trading is both art and science
I could limit this discussion to what we’re doing to prevent overfitting, which I do below, but I think it’s also important to understand how it applies to trading.
This is not an easy topic to discuss, primarily because the market is dynamic — statistical tools are not a fan of dynamic. Likewise, any attempt to forecast the market requires both art and science. Overfitting may explain why simple models don’t work in the future, but it has limitations on models that are dynamic and influenced heavily by outside forces, also referred to as noise.
I was amazed by how much literature there is on the topic of overfitting. It is of particular interest to those in the simulation or machine learning space, which makes sense. These strategies can be viewed as bots or machines and the holy grail of trading strategy is a bot that has “learned” the market. To do this, the bot would have to “learn” how to treat noise. Is it good noise or bad noise? This is where the comparison breaks down. Eliminating the noise in trading is extremely difficult because the market is primarily a reaction to noise, but we may be able to discern the good noise from the bad.
What does this have to do with overfitting? Overfitting happens when a model learns the detail and noise in the historical data that negatively impacts the performance of the model on new data. We know that we can optimize a strategy that then performs poorly in the future, but does it perform poorly because it learned the noise or because the noise is unlearnable? The only thing that’s truly learnable is the ability to identify the ideal conditions for a strategy to operate in. In other words, we’re looking for a strategy that knows when to trade. We can’t control the market, but we can control when the strategy takes action. The result is a strategy with fewer false flags (carving out the gross loss) and a higher profit factor.
So on one hand we can use the concept of overfitting to understand why a strategy does not perform well in the future, but on the other hand we must admit that the concept has limits for something as dynamic as market structure. It is almost impossible to not engage in overfitting when developing strategies, but there are things we can do to minimize it.
What can we do?
I want to go back to the original question. How are we making sure that our strategies aren’t the product of overfitting and will continue to do well in the future?
Optimize on hyperparameters one at a time - When we make changes to a strategy we make them one at a time rather than optimizing several variables at the same time. This gives us an opportunity to isolate the benefit of each change to net profit and profit factor.
Change the conditions of the strategy, not the hyperparameters: We don’t just want to create the machine, we want to change the surroundings or the environment of the machine. We want to create resistance to overfitting within the strategy itself. Put yet another way, we want to create the machine and define the conditional path it takes within the strategy. By doing so, we aren’t as concerned with the hyperparameters as we are the conditions of the strategy. For example, we aren’t as concerned with optimizing the hyperparameters of MACD (slow parameter, smooth, and fast) as much as we are with creating a condition that triggers an action (MACD cross when volume is higher than 3000). The trader can then update the volume accordingly rather than the hyperparameters of MACD. Volume sets the condition. In this way, the holy grail could be a moving target.
Focus on scenarios rather than sensitivity - This piggybacks on the above. We’re more interested in creating an environment within the strategy. This environment is like a scenario (mid day, low volatility, low volume) rather than a single variable. So, while we optimize based on one variable at a time, the conditions within the strategy are based on a scenario.
Use two sets of data in the backtest: Going forward, in addition to doing what we’re doing now, we’d like to use two sets of historical data, one to optimize on and one to test on, both are part of historical data. We’re thinking of using 3 months of validation data for the test along with a 3 month break. So that’s 18 months altogether. We’ll use the first 15 months in the backtest, followed by a 3 month break and then use the next 3 months as a test in two different data charts. This format may change, but this is the current plan.
Use Gross Loss/Profit in the calculation of Profit Factor - We use gross loss and profit rather than a variant that uses average loss and profit when calculating profit factor. The average ratio may smooth out the impact of big winners and losers, but that seems to lend itself to overfitting, especially since this is one of the primary performance measures we track and optimize on.
Use more than one instrument in backtest data - As a final check, we’re going to start running strategies through other instruments such as ES, GC, BTC, YM and stocks.
We’re hoping that these actions will help to ensure we’re picking up the “good” noise and not the noise that’s specific to a particular day or time frame when optimizing on backtests. That said, some of our strategies have no optimization at all and our focus is still heavily concentrated on NQ for its volatility.
I believe the market is full of both random and definitive events. Some things you do every day, some things are random (you know you eat lunch at around 12pm, but you don’t know what you’re going to eat). Like a simulation, everything is interconnected and random in the market and yet if you span out far enough some predictions can be made. It is this fractal nature of trading that makes it so interesting. The conundrum in trading is that the further you span out (i.e. increase the time series) the more susceptible you are to risk; in other words, the more predictable the market is, the more risk your trades carry. The predictable part of trading is the science (indicators, etc) and the unpredictable part is called noise. Indicators act different based on the noise. For example, most indicators are highly predictable in a rally, which is why traders love a good rally. So perhaps the best course of action is to create a strategy that is only triggered during a rally. This is much easier said than done, but just saying it gets us closer to the goal.
Trading is both art and science. Some of the most seminal papers in economic theory use formulas filled with assumptions that don’t make any sense (i.e., symmetry of information — an investor will never have access to the same information that a corporation would and investors will rarely behave rationally). These assumptions hold no relevance now, but they were used as a way to create a science out of economic theory and that science has little real world application. So it’s important to use statistics to track and measure progress and performance, but it’s also important to realize that statistical tools have a limit. Newton had to create the mathematical form we call calculus to express the laws of physics. Likewise, it could be that forecasting market structure requires a completely different way of thinking about “learning”, especially when the prevailing analytical framework dismisses relevant data as noise. As a result, data scientists are quick to blame poor performance on noise, but if we’re looking for a mathematical formula that can predict the true nature of the market it must include the noise. It isn’t that thinking about the market from a statistical perspective is bad, but it can limit our visibility and therefore our ability to find the holy grail of trade strategy if we get too attached.
Ultimately, while there are several key takeaways from this discussion, the main one is that not all noise in trading is bad. The key to discerning the good from the bad isn’t about changing the actions of the strategy, but the environment that those actions are taken in. And, we’re finding that those strategies that control the good noise through conditional equations and data series calculation (minute, range, tick, etc) seem to do better in the future.
What’s In The Pipeline?
There’s a lot in the pipeline, but we’re prioritizing something rather big right now. We’ve been impaled by a unicorn in the past, so we’re doing some due diligence before sharing, but the preliminary results are phenomenal. We want to say thank you to Pierre, one of our subscribers, for making a research request that has the potential to improve all of our strategies. Again, we’re compiling the data now and will share as soon as we feel it’s been properly vetted.