That's right, but if you did that compression, it wouldn't be an n-gram anymore....

That's right, but if you did that compression, it wouldn't be an n-gram anymore. What I'm attempting to get across is that you could model GPT-4 as an equivalent 8000-gram in an abstract sense, but that's not a good mental picture for how it actually functions. Internally, GPT-4 is no more an 8000-gram than Stockfish is a giant lookup table of chess positions. GPT-4 is learning RASP programs, not statistical text correlations.