The Lost Quarter-Century in Data Compression

jws · on Aug 17, 2010

Huffman coding is computationally much simpler, take a histogram of the input tokens and make a token->bitstream function. At the time it was released a newish computer would be an IBM 650 with 2000 words of memory on the drum and 60 words in core. (Note that absence of 'k', 'm', or 'g' in those numbers.) It was "quite popular". 1800 of them were sold.

The LZ algorithms of the later '70s are more computationally involved with dictionary lookups in mutating dictionaries. At this time a newish computer would be a PDP-11 with 64k words of memory and a hard disc drive that could store megabytes! I'm not sure how many were sold, but universities were littered with them.

I suggest LZ style algorithms merely waited for appropriate physical tools.

davidst · on Aug 17, 2010

It's more the rule than the exception when research is done on ideas that at the time of publication are impractical to manifest as a working machine or program. To pick just one (possibly the most well known) example: Einstein's paper on the special theory of relativity was published in 1905. The most significant outcome of this paper took 40 years to manifest itself. Einstein certainly wasn't waiting for appropriate physical tools.

Lempel and Ziv's critical insight - that modelling and coding of probabilities are distinct and separable - could have been understood decades earlier. It wouldn't have mattered that their research couldn't be immediately turned into a working tool. Information Theory would have benefitted and we would surely be further along today.

But, yes, it is certainly true that advances in technology give us an incentive to ask questions that had previously been ignored or misunderstood.

10ren · on Aug 17, 2010

Computer science seems actually to be the exception. Yes, though the insight could have been had earlier, it somehow wasn't. And this is common in computer science. I now attempt to support this thesis.

Even the stereotypical "impractical" research algorithms (with efficient complexity but huge constants) are implemented. That's the magic of computer science, that you can breath life into abstract ideas. But if you cannot implement it at all, there are millions of other cool ideas that you can implement, so you may gravitate towards them. (another argument is that if you can't implement it, it's classed as pure maths. And if you can't prove it, or define it clearly as a conjecture, I'm not sure what it is or what you could do with it - perhaps futurism/science fiction, which has its place.)

The value of implementation is due to computer science ideas tending to be complex and abstract - but once you have an actual concrete working model that you can play with, they can become almost trivial to understand. And so, comparatively, more progress in research is made based on working code than on purely theoretical ideas. It's so much easier to find mistakes in an idea if you can build it. And once your theory is embodied in living, working code, it's tremendously easier to reason from that as a base; much as it is easier to see what is over the next hill after climbing it.

Hardware is closer to your idea of incentive: the inventor of the Wallace tree multiplier told me that it was a simple and obvious idea, but it wasted a lot of silicon; and no one entertained that kind of waste. It simply wasn't conceivable. Perhaps this is due to the incredible rate of improvement in silicon (Moore's Law), so that it's psychological difficult to appreciate (or even believe) that your iPhone will be 1000 times faster in 15 years; or what changes that even means for algorithms. Perhaps the most basic assumptions of the algorithm changes? When x10 quantitative improvement often translates into a qualitative improvement, what does a x1000 change mean? Too hard to predict.

davidst · on Aug 17, 2010

Computer science gives us a better opportunity to test ideas than, say, particle physics. The ability to experiment so easily is a major part of the appeal for many of us.

> Even the stereotypical "impractical" research algorithms > (with efficient complexity but huge constants) are > implemented.

Well, they're all Turing machines, but you're not going to find any video processing algorithms that were implemented on a 4K Altair so let's not go too far.

It's interesting to consider a given novel idea and wonder how far back in time it would have been possible to implement it in a practical way and how much further back in time it would have been useful simply to know how it works.

Let me pick a concrete example that we are all familiar with: Hashing. Hashing and hash lookup algorithms have been around a long time. It's an area that people might expect would have been exhausted of significant new ideas long ago. Not so! Cuckoo hashing was invented in 2001 and gives us dramatically more efficient use of memory at a slight performance cost. It can make the difference between a hash table that fits in memory and one that doesn't so it's a meaningful advance.

Cuckoo hashing could have been implemented on a 4k Altair and it would have been useful there. But no one had the idea then.

But wait-- It gets stranger! Cuckoo hashing was literally under our noses and somehow no one saw it! It is a variation of set associative hashing which is what most CPUs had already been doing in their caches for years. It's not exactly the same algorithm but if you check you'll see it's so close the distinctions are minor.

Fascinating, isn't it? Here is an important algorithm that was actually live and running in hardware yet no one considered building it in software for a long time.

jws · on Aug 17, 2010

Isn't the last part of the Cuckoo hashing paper the part that documents it as slight less good compared to existing algorithms? I mean it never comes out and says it, but if you look at the pictures… I think Cuckoo's best point was the provability of its bounds, but it has been a while since I read that paper.

I like the example of Parsing Expression Grammars. The idea of memoized top down parsing drifted past back in the 60's and early 70's when the folks like Aho and Ullman were busy slaying wild streams of symbols with ever more clever algorithms but was dismissed because of its memory footprint. Then in 2002 or so Bryan Ford said "Oi! What do you mean too much memory? My PC has an L2 cache bigger than that."[1] Now PEGs are widely used and I must say (having built dynamic grammar LR systems back when my time shared vax had a main memory the size of the L2 cache in my iMac) especially useful for dynamic grammars.

[1] Being an entirely fabricated quote, there is no applicable reference for this quotation.

davidst · on Aug 17, 2010

> Isn't the last part of the Cuckoo hashing paper the > part that documents it as slight less good compared > to existing algorithms?

No, Cuckoo hashing is a trade of a little run-time performance for better memory utilization. It's still O(1). The performance cost amortizes as a constant.

If you have gobs of memory you don't need it. But if you're memory-constrained it's a big win.

btilly · on Aug 17, 2010

If you're interested in another example of how success breeds stagnation, see On Proof and Progress in Mathematics by William Turston, available at http://www.ams.org/journals/bull/1994-30-02/S0273-0979-1994-...

In particular look at section 6 where he describes how his research into foliations had the effect of killing the field for a number of years. He contrasts this to how his different approach to geometrization of 3-manifolds lead to the healthy growth of the field. By, among other things, his efforts at exposition, and his deliberate withholding of results he already knew the answer to so that other people could prove them instead.

mturmon · on Aug 17, 2010

This is an excellent article, from a researcher who knows what he's talking about (Thurston is a Fields medalist). Section 6 begins on the page numbered 173.

gruseom · on Aug 17, 2010

his deliberate withholding of results he already knew the answer to so that other people could prove them instead

That's extraordinary! I never heard of such a thing. Are other cases known?

gjm11 · on Aug 17, 2010

Gauss seems to have proved lots of (surely publishable) things that he didn't publish. I don't know what his reasons were.

ajb · on Aug 17, 2010

According to Erdos, Gauss was very discouraging to his students, because when they came and told him that they'd proven something, he would tell them he'd proven it years before.

from "The man who loved only numbers.' by Paul Hoffman.

terra_t · on Aug 17, 2010

Silly.

Practical dictionary-based schemes have an entropy encoder on the back end and often that's a Huffman encoder or something similar in spirit.

Huffman encoding is superseded not by dictionary-based methods, but by arithmetic encoding, which can compress beyond the Huffman limit because it can use fractions of a bit.

7zip, for instance, beats the pants off gzip because it uses a very large dictionary window AND arithmetic compression instead of huffman encoding on the back end.

davidst · on Aug 17, 2010

I think you're missing the point.

The article does not claim that Huffman coding was superseded by dictionary methods. It does not confuse statistical modelling with encoding.

ck2 · on Aug 17, 2010

I remember reading an interview with Phil Katz (pkzip) like a decade ago about how they were looking next at markov models for more advanced compression. It even made me go to the library to look up "markov".

So over a decade later, nothing really new.