Loop With Caution
Agent loops are hot this week, proceed with caution
To save you time, this applies to production codebases rather than throwaway projects. If you're building something with a user base of 1, loop away, YAGNI.
You shouldn't be prompting your agent, you should be designing loops that prompt your agent. [1]
Hang on, stepping out of the echo chamber here.
Ok, i'm out.
Let's talk about the current SOTA X advice. Stop prompting and let your agent prompt itself forever. We've even have a new term in less than 48 hours: Loop Engineering!
I think the instinct is right. Designing loops is the next step toward autonomous agents but unless you have a codebase or platform with the right patterns to scale, you're going to run into trouble.
Here's how I see this playing out:
Your agent loops seem to work well weeks 1 & 2. Maybe, if you're lucky you get that far. On week 3 you've generated so much slop that each loop cost begins to compound to address the inconsistencies the previous loops created.
The foundation or platform your product sits on is so fragile that every new feature add cascades and the underlying code that supports the feature (if any) requires changes.
Things you've done:
- Killed the scalability of whatever application you're building
- Token cost explodes as you continue to build/scale your app.
I hope the agent decided to include some observability in your project because it's on a highway to your project graveyard.
I'm not against AI, in fact it's quite the opposite, I've been an early adopter and advocate for the last few years.
I have one ask, can we please be a little more careful about what we're promoting? Advice like this literally makes it harder to increase adoption amongst veteran engineers who are still skeptical about AI. Yes, they still exist, and sure they're becoming more open to the idea of AI assisting them since the start of this year.
It takes one AI pilled token maxxxing junior engineer to ship a looped feature that breaks prod and trust in this tooling.
Let's expand the one word in this statement that might actually make loops successful.
design
What goes into actually designing a loop such that it scales to deliver what you actually asked for?
Harness engineering plays a role here. When I run a loop, I want to ensure it has the necessary context to output something that matches the patterns and quality of the code I expect in my codebase.
So what does this mean and how do we get there?
Take this list with a grain of salt, it's a reference for everything that has to go right:
- The codebase properly implements some type of observability stack, logging/metrics
- DTO's are clearly defined, whether its a database schema, graphql schema, OpenAPI schema, all of this is very clear to the agent and that context is available in the decision making process.
- Some form of testing exists, and it's not the same agent writing the code and tests
- A UI library and design system exists for consistency, use shadcn or whatever but don't let it write individual components every time.
- Data fetching is consistent, cache invalidation exists, it's very clear how data moves from your backend to your frontend.
- Please, please, make sure at some point you're having the agent document blessed patterns and that you have some process to clean up stale references after each code change.
Of these items, documentation is the most important, I've been leaning on ADR's for this for the last year and a half and they're working great. Use whatever you want, but tl;dr make a decision and stick to it. The counterargument is that code itself is documentation and yes I agree depending on the age of the codebase you're working in.
Designing Agent Loops
Generally, here's how I'd design my loop.
Prerequisites
- Plugging in graphify here, it builds a graph of your code and is great for clustering concepts when you're focusing on a specific part of the code. I think there's room for improvement but saving this for another post.
- You've actually built a platform with patterns that scale to support product code, see harness engineering.
Loop Design
/loop add a feature to my todo app that emails VC's every time I merge a PR
- Explore: Learn about the codebase, the accepted patterns, code style, existing features, supporting documentation.
- Plan: Generate a plan for how we implement this new feature, include strategies for observability, testing, UI design, schema changes.
- Execute: Use a sub agent with fresh context to write the tests, yes, TDD, there's still value here. Use a sub agent to implement the code.
- Review: Use a sub agent with fresh context to review the code/tests and validate the implementation
End of the loop, at this point all of the code has hopefully been written and reviewed.
Call me old school, I still read the code, and manually test it.
Caution
Instructions
add a feature to my todo app that emails VC's every time I merge a PR
Call it a PRD or Spec, but this is a pretty lightweight set of instructions. Depending on where the LLM roulette wheel lands, you might have some very different implementations per run.
Someone still needs to decide what actually goes into this feature before its built.
Who is making these decisions? The model? If I'm not supposed to prompt it anymore how does it know what I want it to build?
Token cost
My goal is to spend the least amount of tokens possible at any time. Personally, I want to stretch my Claude Max subscription to last me through the end of the week (currently 85% usage on Tuesday :/), at work and throughout the industry, the writing is on the wall, token spend is going to become limited in some way.
Conclusion
Speaking to the new rockstars of the AI generation, I want you to succeed, I believe in the product and where its headed but this type of hype isn't it. It's not a gut feeling, I've taken both paths. The path with less human input isn't producing the outcomes I want, skill issue?, sure call it that. These tools can't read my mind, yet, but are we really ready to promote this idea of stop prompting the agent?