Include in the prompt a verifiable testable exit criteria (compiling) and use agentic AI like cursor or codex with this, you’d be surprised what happens :)
Is claude code with both Sonnet and Opus agentic enough? Because it is constantly finding creative ways to ignore direct, repeated instructions ("user asked X but it is hard, let's do Y instead"), implement fake tests ("feature X is complex. we need to test it completely. let's write script that will create files that feature X would have created, then test that files exist"), sabotage and delete working code ("we need to track FD of the open file (runs strace). The FD is 5 (hardcodes 5 in the code instead of implementing anything useful) tests pass now!")
I have not experienced the level of malice and sweet-talking work avoidance from anyone. It apologizes like an alcoholic, then proceeds doubling down.
Can you force it to produce actually useful code? Yes, by repeatedly yelling at it to please follow the instructions. In the process, it will break, delete, or implement hard to find bugs in rest of the codebase.
I'm really curious, if anyone actually has this thing working, or they simply haven't bothered to read the generated code
You need to use the features that Claude Code gives you in order to be successful with it. Your build and tests should be in a Stop hook that prevent Claude from stopping if the build or tests fail. Combining this with a Stop hook that bails out if the first hook failed n times already prevents infinite loops.
With anything above a toy project, you need to be really good with context window management. Usually this means using subagents and scoping prompts correctly by placing the CLAUDE.md files next to the relevant code. Your main conversation's context window usage should pretty much never be above 50%. Use the /clear command between unrelated tasks. Consider if recurring sequences of tool calls could be unified into a single skill.
Instead of sending instructions to the agent straight away, try planning with it and prompting it to ask your questions about your plan. The planning phase is a good place to give Claude more space to think with "think > think hard > ultrathink". If you are still struggling with the agent not complying, try adding emplasis with "YOU MUST" or "IMPORTANT".
As I'm getting better and better results with it, I'm having it do more and more things. I went through a complete agentic refactor of a project from Angular 17 to Angular 20 (RxJS to Signals) and I'd say it did it perfectly. A few times I'd get it summarize and start a new chat because it can start to get less effective when the history gets too long. I also had to iterate on what I wanted and do things a step a time. Although it was very clear that it also wanted to do things in pieces and test each major change before continuing on.
I think like any tool it's has it's pros and cons and the more you use it the more you figure out how to make the best use out of it and when to give up.
Don’t even get me started. A colleague of mine made me screenshot a .env on a video call “for security” and I spent 30 min correcting OCR on it until it worked
This makes me laugh. “GenAI makes you a genius without any effort”, and “Stop wasting time learning the craft” are oxymorons in my head. Having AI in my life has been like having an on demand tutor in any field. I have learned so much
Code is amazing. I'm not sure why OpenAI isn't using it as their default CLI. I was cancelling my membership and stumbled upon it right before, now I'm dropping my other subs to move to this.
But if that is going to be the case, I want to be the best of the best at understanding it all so that I’m the first hired and last fired lol
reply