Using AI To Develop Prop Firm Strategies: Part 4

Claude is up $25K.

Jun 08, 2025

∙ Paid

Important: There is no guarantee that ATS strategies will have the same performance in the future. I use backtests and forward tests to compare historical strategy performance. Backtests are based on historical data, not real-time data so the results shared are hypothetical, not real. Forward tests are based on live data, however, they use a simulated account. Any success I have with live trading is untypical. Trading futures is extremely risky. You should only use risk capital to fund live futures accounts and if you do trade live, be prepared to lose your entire account. There are no guarantees that any performance you see here will continue in the future. I recommend using ATS strategies in simulated trading until you/we find the holy grail of trade strategy. This is strictly for learning purposes.

For links to all strategies click here

One of my favorite Star Trek Next Generation episodes is "Darmok". This episode is known for featuring a unique alien species, the Tamarians, who communicate through allegories and metaphors based on their cultural mythology. The episode "Darmok" revolves around Captain Picard's attempts to establish communication with the Tamarian captain, Dathon, who speaks in these seemingly nonsensical phrases. Examples include:

"Darmok and Jalad at Tanagra": Meaning cooperation or working together.
"Shaka, when the walls fell": Signifying failure.
"Temba, his arms wide": Indicating a gift or offer.

All context and meaning is communicated in just a few words. Through shared experiences and the patient efforts of both captains, Picard gradually begins to decipher the Tamarian language, and the phrase "Sokath, his eyes open" comes to represent a moment of enlightenment or a breakthrough in understanding.

Back in Part 1, I argued that the real edge isn’t using AI, but communicating with it.
Dario Amodei’s essay on interpretability suggests that we are “growing” systems we don’t yet understand, so extracting reliable output depends on how we speak to them.

That first post shared the five “prompt enzymes” I use every day:

ask the model to show and review its work
trigger domain expertise with precise language (ie Darmok and Jalad at Tanagra)
think in layers, not single dimensions
build a decision hierarchy
iterate in small, testable chunks

And I’m going to add a 6th one that was inferred, but never said outright—use other LLMs to validate. In other words, I used Claude to help write the code for most strategies. I used Gemini to check that code.

The process seeded five supposedly prop-firm-ready strategies—one each from Claude, Manus, Grok 3, OpenAI 3 and OpenAI 4.

Part 2 walked through the why of prop evaluations, crowned Topstep the cleanest battlefield, and published the first forward-test stats. Claude37 and Grok 3 sprang out front; Manus and the two OpenAI variants wobbled.

Part 3 dug into why Claude and Grok survived. Claude’s edge came from multi-timeframe confirmation and dynamic sizing; Grok’s from an elegantly simple risk-reward module.

Each LLM used a different mathematical framework to balance risk and reward. So far Claude and Grok are the standout strategies, so let’s start by looking at how each strategy is doing today. Here’s a quick overview:

OpenAI3 did not perform. Gemini failed to compile.

Claude still wears the cape, up $25 ,415 since April 22 on 436 trades. Max MAE is high. Manus, the quiet assassin—low count, steady dollars, pain in check—outflanked GROK, up $5 ,025. Manus’ worst sting is less than $3K and average MAE is $1 ,048, which makes it the only truly viable strategy to use with an evaluation. GROK3 (aka Shaka, when the walls fell) is coming in flat to positive at $275, but I believe GROK’s issue is in the implementation, not the logic.

Claude nails profit, but still breaks the trailing drawdown by ~$4K. Manus ticks every box except the “five active days” rule. The biggest issue with all of these strategies is risk management. Critical risk management gaps include:

Only Claude explicitly monitors the consistency target requirement.
Most strategies implement daily loss limits, but fail to properly track the trailing Maximum Loss Limit, which is calculated from end-of-day account balance highs.
Several strategies use fixed contract sizes without consideration of account growth or position size limits.

Enter hybrid approach:

Use Claude's entry signals
Replace Claude's risk calculation with GROK's risk/reward system (a mathematics lesson disguised as two lines of code).

That’s exactly what I did in Part 3 and it created Strategy 101. Here’s another look at those backtest results for quick reference:

I've included this as Strategy 101 for download below. It's fickle—when I ran it in the forward test last week, the trades were poor. I'll be testing it again this week. In the meantime, I've decided to release it early. I welcome your forward test results on this one.

ATS Strategies That Fit The Prop Firm Profile

We could end here, but I want to take it one step further. What if we combine the best of LLM-generated strategy logic with logic from our own strategies. I shared a few strategies that fit the prop firm profile in Part 3. Here’s a quick update of how those strategies are doing after last week:

Using AI To Develop Prop Firm Strategies: Part 4

Claude is up $25K.

ATS Strategies That Fit The Prop Firm Profile

This post is for paid subscribers