I tried an experiment like this a while back (for the GPT-5 launch) and was surp...

andy99 · 2025-11-11T23:23:26 1762903406

I feel like I’ve seen this with code too, where it’s unlikely to scrap something and try a new approach a more likely to double down iterating on a bad approach.

For the svg generation, it would be an interesting experiment to seed it with increasingly poor initial images and see at what point if any the models don’t anchor on the initial image and just try something else

simonw · 2025-11-11T23:27:40 1762903660

Yeah, for code I'll often start an entirely new chat and paste in just the bits I liked from the previous attempt.

consumer451 · 2025-11-18T00:54:27 1763427267

> Yeah, for code I'll often start an entirely new chat and paste in just the bits I liked from the previous attempt.

Hi Simon, it is very likely that I am misunderstanding your comment, however:

Do you use chatbot UIs like chatgtp.com, claudi.ai, LibreChat for coding, instead of something Cursor, Windsurf, Kiro, etc?

If that is the case, I am really curious about this. Or, did you just mean for benchmarking various models via simple chat UIs?

simonw · 2025-11-18T03:11:32 1763435492

I'm usually in either the ChatGPT or Claude web interfaces or working directly in Claude Code or Codex CLI.

In either case case I'll often reset the context by starting a new session.

consumer451 · 2025-11-18T03:36:10 1763436970

Thanks for the answer. OK, yes. That makes a lot more sense. I am context greedy ever since I read that Adobe research paper that I shared with you months ago. [0]

The whole "context engineering" concept is certainly a thing, though I do dislike throwing around the word "engineer" all willy-nilly like that. :)

In any case, thanks for the response. I just wanted to make sure that I was not missing something.

[0] https://github.com/adobe-research/NoLiMa

jameslk · 2025-11-12T00:05:43 1762905943

Maybe there’s a bias towards avoiding full rewrites? An “anti-refucktoring” bias

I’d be curious if the approach would be improved by having the model generate a full pelican from scratch each time and having it judge which variation is an improvement. Or if something should be altered in each loop, perhaps it should be the prompt instead

simonw · 2025-11-12T12:39:28 1762951168

Yeah I think you're right. In most cases it's extremely annoying to have the model make any more then minimal changes to code you provide it.