Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not sure how your solution gives a uniformly distributed sample?


So you take the first item in the stream. Then you take the second. Flip a coin. Pick the winner, discard the loser. Promote the winner to level 2. Take in the next 2. Promote hte winner to level 2, discard the loser. Pick a winner from the two level twos, discard the loser. Promote the winner to level three. The winner at the top level k will always be picked with probability 1/2^k. If there were 2^k elements, this is exactly the probability it should have; and by the symmetry of selection at each level, it's uniform.

There's some complications if the sample is not a power of two (or whatever number of elements you picked for each level). Essentially though, you know the number of elements at that point though, and you can weight the probability of selection so that the winner has exactly 1/N probability of having been selected.


Right, that works; but it seems different from "Pick one from first 10, pick one from next 100, pick one from next 1000".


Fundamentally it's the same.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: