Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'll be that person:

* Gemini has the highest ceiling out of all of the models, but has consistently struggled with token-level accuracy. In other words, it's conceptual thinking it well beyond other models, but it sometimes makes stupid errors when talking. This makes it hard to reliably use for tool calling or structured output. Gemini is also very hard to steer, so when it's wrong, it's really hard to correct.

* Claude is extremely consistent and reliable. It's very, very good at the details - but will start to forget things if things get too complex. The good news is Claude is very steerable and will remember those details if you remind it.

* GPT-5 seems to be completely random for me. It's so inconsistent that it's extremely hard to use.

I tend to use Claude because I'm the most familiar with it and I'm confident that I can get good results out of it.



I’d say GPT-5 is the best in following and remembering instructions. After an initial plan it can easily continue with said plan for the next 30-60 minutes without human intervention, and come back with a complete working finished feature/product.

It’s honestly crazy how good it is, coming from Claude. I never thought I could already pass something a design doc and have it one-shot the entire thing with such level of accuracy. Even with Opus, I always need to either steer it, or fix the stuff it forgot by hand / have another phase afterwards to get it from 90% to 100%.

Yes the Codex TUI sucks but the model with high reasoning is an absolute beast, and convinced me to switch from Claude Max to ChatGPT Pro


Gemini is also the best for staying on the ball (when it does) over long contexts.

It's really the only model that can do large(er) codebase work.


Claude can do large code bases too, you just need to make it focus on parts that matter. Most of the coding tasks should not involve all parts of the code, right?


GPT-5 seems best at analyzing the codebase for me. It can pick up nuances and infer strategies Claude and Gemini seem to fail at.


Personally I prefer Gemini because I still use AI via chat windows, and it can do a good ~90k tokens before it starts getting stupid. I'm yet to find an agent that's actually useful, and doesn't constantly fuck up everywhere while burning money.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: