Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I tried an experiment like this a while back (for the GPT-5 launch) and was surprised at how ineffective it was.

This is a better version of what I tried but suffers from the same problem - the models seem to stick close to their original shapes and add new details rather than creating an image from scratch that's a significantly better variant of what they tried originally.



I feel like I’ve seen this with code too, where it’s unlikely to scrap something and try a new approach a more likely to double down iterating on a bad approach.

For the svg generation, it would be an interesting experiment to seed it with increasingly poor initial images and see at what point if any the models don’t anchor on the initial image and just try something else


Yeah, for code I'll often start an entirely new chat and paste in just the bits I liked from the previous attempt.


> Yeah, for code I'll often start an entirely new chat and paste in just the bits I liked from the previous attempt.

Hi Simon, it is very likely that I am misunderstanding your comment, however:

Do you use chatbot UIs like chatgtp.com, claudi.ai, LibreChat for coding, instead of something Cursor, Windsurf, Kiro, etc?

If that is the case, I am really curious about this. Or, did you just mean for benchmarking various models via simple chat UIs?


I'm usually in either the ChatGPT or Claude web interfaces or working directly in Claude Code or Codex CLI.

In either case case I'll often reset the context by starting a new session.


Thanks for the answer. OK, yes. That makes a lot more sense. I am context greedy ever since I read that Adobe research paper that I shared with you months ago. [0]

The whole "context engineering" concept is certainly a thing, though I do dislike throwing around the word "engineer" all willy-nilly like that. :)

In any case, thanks for the response. I just wanted to make sure that I was not missing something.

[0] https://github.com/adobe-research/NoLiMa


Maybe there’s a bias towards avoiding full rewrites? An “anti-refucktoring” bias

I’d be curious if the approach would be improved by having the model generate a full pelican from scratch each time and having it judge which variation is an improvement. Or if something should be altered in each loop, perhaps it should be the prompt instead


Yeah I think you're right. In most cases it's extremely annoying to have the model make any more then minimal changes to code you provide it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: