Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Actually if you look at examples in the real world, you don't always get that EXACT distribution, but rather the phenomenon, that the smaller digits are way more prevalent.

And once again, this is because of a simple analysis. Let me state it a DIFFERENT way, maybe this is something that you will take note of: the only time all possible digits have equal probability of being leading digits is when we have a uniform distribution with a max that is a power of 10. Literally every other distribution starts to exhibit that phenomenon. Now you can quibble as to what distributions lead to that EXACT curve fit. And there can be explanations for why power law distributions do. But other distributions exhibit this same PHENOMENON, while not necessarily converging to that exact proportion. As I said before, any other uniform distribution would not have those exact proportions, but would exhibit the phenomenon.

Basically, every continuous distribution is highly unlikely to have a cliff at a power of 10. It is going to go down gradually, and therefore if it includes the range 8000-9000 then it will probably include numbers above 10000. And even a discontinuous one with a uniform distribution (with a cliff at the end) will exhibit the phenomenon. OK? So if you have the range 8000-9000 in there, that means 1s and 2s were a lot more prevalent, and if you have a continuous distribution then 10,000+ numbers will be there, but perhaps not numbers 90,000+.

Do you at least get the intuition behind this? As soon as you get numbers close to a power of 10, your distribution probably includes numbers in the next order of magnitude, i.e. a lot of leading 1s. The more numbers you get starting with 9, the more it is highly unlikely you're right at the max of your distribution, with a cliff. Unless that happens to be the contrived "empirical test" that was linked to as "proof" that uniform distributions lead to equal changes for every digit to be leading.

The intuition is what matters. Now, maybe for UNIFORM distributions, or POWER distributions, that exact curve fit can be worked out. Perhaps you can show there is a "large" family of distributions for which the curve fits. Kind of like the law of large numbers. But I didn't do anything quite as ambitious. I simply showed why it's not just true for normally distributed processes, but others which you would think are uniformly distributed. Because chances are, that process has more 1s than 2s, and 2s than 3s, in exactly the proportion that a uniform distribution with some max would have, and the chances are the max isn't exactly a power of 10.

In practice, all this means is that the distribution may look like Benford's law for 1, 2, 3 and then drop to equally small for 4, 5, 6, 7, 8, 9. The 1 is going to be 10x more prevalent than the 9. The 2 would 5x more prevalent or whatever, UNLESS the distribution had a cutoff right before 200 or 2000. Understand? And this DOES HAPPEN.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: