It's not really news that today's AIs make dumb mistakes, especially around BPE ...

imiric · 2025-07-07T01:49:10 1751852950

> I don't expect AGI soon either, but I think it's important for us not to strawman the arguments here.

This is not a strawman. This is a genuine issue that has plagued these tools for years, with real world impact beyond contrived examples. Yet users are expected to ignore it because this is how they work? Nonsense. It's insulting that you would trivialize something like this.

> (a) the rate of improvement has been fast

I wouldn't describe it as "fast". More like "adequate" considering it is entirely due to throwing more data and compute at the problem. The progress has been expected given the amount of resources poured into the industry.

Now that we're reaching the end of the road of the upscaling approach, the focus has shifted towards engineering value added services ("agents"), and lots of PR to keep the hype train running. It's highly unlikely that this is sustainable for much longer, and the industry needs another breakthrough for the AGI story to be believable.

> (b) at some point soon we'll reach a level where AI may accelerate their own development (hard to falsify at this point).

Why isn't this happening today? Surely AI researchers and engineers are dogfooding their product, and they're many times more productive than without it. Why are then improvements still incremental? Why are we still talking about the same issues after all these years? Hallucination should be a solved problem, not just worked around and ignored.

> I think it's also important to realize that for AGI to arrive, only 1 model out of many attempts needs to qualify.

All models have the same issues. Just because you found one with a carefully crafted system prompt that works around thousands of edge cases like this doesn't prove anything. Or are you implying that o3 doesn't use BPE?

> So we'll need to move the goalposts a little further.

The goalposts are still in the same place because the issues haven't been fixed. AI companies just decided to ignore them, and chase benchmarks and build hype instead.