Using AI To Beat Funded Trader Games: Part 3
"Knowing is not enough; we must apply. Willing is not enough; we must do." -Bruce Lee
Important: There is no guarantee that ATS strategies will have the same performance in the future. I use backtests and forward tests to compare historical strategy performance. Backtests are based on historical data, not real-time data so the results shared are hypothetical, not real. Forward tests are based on live data, however, they use a simulated account. Any success I have with live trading is untypical. Trading futures is extremely risky. You should only use risk capital to fund live futures accounts and if you do trade live, be prepared to lose your entire account. There are no guarantees that any performance you see here will continue in the future. I recommend using ATS strategies in simulated trading until you/we find the holy grail of trade strategy. This is strictly for learning purposes.
In the evolving landscape of automated trading, Large Language Models (LLMs) are increasingly used to generate trading strategies. This analysis examines five LLM-created strategies designed specifically to pass a prop firm evaluation.
For those just joining us, a proprietary ("prop") trading firm trades its own capital, not client funds. Traditional firms hire salaried traders; modern "remote" firms allocate capital to independent traders and split the profits instead.
My approach: I interviewed five leading AI systems (Claude.ai, Manus, Grok3, OpenAI3, and OpenAI4.5) using the prompting techniques described in Part 1. Each AI developed a strategy tailored to specific prop firm parameters. The code revealed risk management features I hadn't considered before. One in particular is my new favorite “go to” for systems in general. It has the ability to give any strategy a mathematical edge and is perhaps the slickest bit of risk management code I’ve ever seen. When I show you how it works, you’ll be shocked at the simplicity.
Performance Results
Last week I published Part 2 of this series, which provides preliminary performance stats for all five LLM created strategies. This is where we are as of today:

Week-over-week, Claude took a bit of a hit, but continues to be profitable. Grok 3 improved, and Manus pulled a Biggie Smalls (went from negative to positive).
Key Finding: Even though all strategies backtested well, real-world testing revealed that only three of the five strategies (Claude37, GROK3 and Manus) achieved profitability, with Claude demonstrating superior performance. This outcome strongly correlates with risk management implementations. It is also important to note that the issue may be in trade mechanics rather than strategy logic, especially for OpenAI03. If anyone figures out what the issue is, let me know and I’ll comp you a free month.
Where to from here…
The biggest issue with all of these strategies is risk management. Critical risk management gaps exist across strategies:
Only Claude explicitly monitors the consistency target requirement.
Most strategies implement daily loss limits, but fail to properly track the trailing Maximum Loss Limit, which is calculated from end-of-day account balance highs.
Several strategies use fixed contract sizes without consideration of account growth or position size limits.
The goal is to create a master strategy that is both profitable and remains within the risk parameters of the prop firm. Each LLM used a different mathematical framework to balance risk and reward. So far Claude and Grok are the standout strategies, so let’s start by looking at what makes both models so special.