I think there’s a good clue here to what may work for frontier models — you defi...

I think there’s a good clue here to what may work for frontier models — you definitely do not remember everything about a random day 15 years ago. By the same token, you almost certainly remember some things about a day much longer ago than that, if something significant happened. So, you have some compression / lossy memory working that lets you not just be a tabula rasa about anything older than [your brain’s memory capacity].

Some architectures try to model this infinite, but lossy, horizon with functions that are amenable as a pass on the input context. So far none of them seem to beat the good old attention head, though.