Ts_zip: Text Compression Using Large Language Models

WantonQuantum · on Aug 14, 2023

This is interesting if impractical.

It occurs to me that using LLMs for compression could, in principle, allow lossy compression of text. If a sequence of tokens happens to be costly to encode (in terms of bits), the compressor might be able to replace that sequence with a cheaper sequence that has a very similar meaning within the context. I don't imagine it would be very useful for anything but it's interesting to think about.