Let's ignore whatever BPE is for a moment. I, frankly, don't care about the technical reason these tools exhibit this idiotic behavior.
The LLM is generating "reasoning" output that breaks down the problem. It's capable of spelling out the word. Yet it hallucinates that the letter between the two 'A's in 'Hawaii' is 'I', followed by some weird take that it can be confused for a 'W'.
So if these tools are capable of reasoning and are so intelligent, surely they would be able to overcome some internal implementation detail, no?
Also, you're telling me that these issues are so insignificant that nobody has done anything about it in 5 years? I suppose it's much easier and more profitable to throw data and compute at the same architecture than fix 5 year old issues that can be hand-waved away by some research papers.
The LLM is generating "reasoning" output that breaks down the problem. It's capable of spelling out the word. Yet it hallucinates that the letter between the two 'A's in 'Hawaii' is 'I', followed by some weird take that it can be confused for a 'W'.
So if these tools are capable of reasoning and are so intelligent, surely they would be able to overcome some internal implementation detail, no?
Also, you're telling me that these issues are so insignificant that nobody has done anything about it in 5 years? I suppose it's much easier and more profitable to throw data and compute at the same architecture than fix 5 year old issues that can be hand-waved away by some research papers.