Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> One obvious drawback is that you have to flip the coin a 1000 times to produce the first unbiased random bit, while Neumann's method starts producing bits much earlier.

Definitely. Though you could fix that problem relatively easily.

I think you might even be able to run von Neumann's method first, but store the coin flips; and then once you've got enough stored, extract a few more bits from the already used flips.

Perhaps like this:

When you do two flips, you add one of three possible tokens to your list:

'double-heads', 'double-tails' or 'mixed'.

Crucially, you only store 'mixed' and not whether it was 'head-tails' or 'tails-heads' because that information was already used to produce the von-Neuman-bit.

After your list has 1000 entries, you run an algorithm a bit like what I originally described to extract bits. The complication is that the table you construct has all possibilities of shuffling a fixed number of 'double-heads', 'double-tails' or 'mixed' tokens.



It feels like an awkward middle ground. Like this method has a limited entropy output for the first 2000 coin flips (first 1000 double-heads/double-tails/mixed entries), and then suddenly it adds back a ton of lost entropy.

An other commenter linked to some papers for asymptotically optimal entropy generation, I wonder if there is more of a streaming method there. It feels like there has to be, even maybe after a slow start. My naive intuition is that after 1000000 coin flips you have a good idea what p is, and then you can basically do arithmetic coding from there. Of course a theoretically correct method can't do exactly this, but it might asymptotically approach it.


Oh, if you want something practical, the approach that Linux's /dev/random takes is probably the one to go.

/dev/random being unbiased relies on some assumptions about cryptography; but in practice these assumptions are at least as well-founded as our assumption that our coin flips are independent.

Look at some of the papers mentioned in other comments on this submission. There are (near) optimal streaming methods.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: