It's no surprise that SipHash from DJB is so commonly used as default hash funct...

rurban · on Sept 12, 2016

It's not. It's not resistant against hash-flooding attacks, and it's the slowest hash function which is used in the wild. It has some good hash function qualities, but none of its qualities would qualify it as a good hash table function. the authors only compare the speed against secure hash functions like MD5 and SHA*, but then are talking about hash tables and compare it against FNV, Murmur and City. (City was replaced by Farm btw. some years ago already)

Below some guys quote my smhasher README, where I tried to get some common sense out.

There are secure hash functions and there are secure hash tables. The topic we are talking here, starting with Robin Hood. But those two topics are mostly orthogonal. A 32-64bit hash function can never be called secure, it's trivial to brute force it. A secure hash function is not usable in a hash table with 32bit, for which 7-15bits are needed for attacks. Get over it and use proper hash tables, and esp. not linked lists. We are not in the eighties anymore.

kbd · on Aug 31, 2016

From the SMhasher page:

> The hash table attacks described in SipHash... cannot not be the problem of the hash function, but the hash table collision resolution scheme. You can attack every single hash function, even the best and most secure if you detect the seed, e.g. from the sort-order, so you need to protect your collision handling scheme from the worst-case O(n)...

> I.e. the usage of siphash for their hash table in Python 3.4, ruby, rust, systemd, OpenDNS, Haskell and OpenBSD is pure security theatre. siphash is not secure enough for security purposes and not fast enough for general usage. ... Provable secure is only uniform hashing, i.e. 2-5 independent Mult or Tabulation, or using a guaranteed logarithmic or linear collision scheme, such as Robin Hood or Cockoo hashing.

nabla9 · on Sept 1, 2016

>if you detect the seed, e.g. from the sort-order,

Good luck with detecting 128-bit secret key from sort order.

>State-recovery.

>A simple strategy to attack SipHash is to choose three input strings identical except for their last word, query for their respective SipHash outputs, and then \guess" the state that produced the output v1 ⊕ v2 ⊕ v3 ⊕ v4 for one of the two strings. The attacker checks the 192-bit guessed value against the two other strings, and eventually recovers the key. On average d2^191 evaluations of SipRound are computed.

>Internal collisions.

>As for any MAC with 256-bit internal state, internal collisions can be exploited to forge valid tags with complexity of the order of 2^128 queries to SipHash. The padding of the message length forces attackers to search for collisions at the same position modulo 256 bytes.

https://131002.net/siphash/siphash.pdf

rurban · on Sept 12, 2016

You don't need luck, you just need enough timing data on the collisions and ordering and then use a solver iteratively to produce testcases to get down to the seed. With normal sized hash tables and measurable collision timings you work with 10 bits. It's a matter of a few hours CPU time. Exploiting public JSON interfaces e.g sounds easiest.

My claim stands that SipHash for hash tables is pure snake oil. Use a proper hash table instead, esp. when they are faster also. Linked lists are 1980ies technology. Even djb can make mistakes. But to his rescue his old solution to this problem is still one of the best. I would rather blame the other guy who came up with these questionable recommendations, and esp. the complang guys for trusting snake oil.

Manishearth · on Sept 1, 2016

I think that comment is a bit extreme.

> You can attack every single hash function, even the best and most secure if you detect the seed, e.g. from the sort-order, so you need to protect your collision handling scheme from the worst-case O(n), i.e. separate chaining with linked lists

or protect the seed from being revealed. Timing attacks can be used to reveal the seed in some situations, but not all.

It also says "for security purposes", which is pretty vague. It's certainly not secure enough for a security-related application. But for most use cases it's fine when you want to just have some protection against map DOSes. Not being useful for security-related applications doesn't make it security theater.

It also says that it's useless in Rust (among others), and then goes on to say that robin hood hashing is a solution. Rust has had robin hood hashing for years.

rurban · on Sept 12, 2016

I wrote that. It is not extreme.

"Security purposes" is mostly attacking routers, dns daemons or kernels via hash flooding which don't have that bad hash tables as the mentioned languages. Interestingly those devs are snake oil resistant.

Security theatre is the claim by siphash and its proponents. The siphash paper analyzes a very small part of hash tables usage, and then wildly exaggerates its logic. People are reading it as the claims in the section before "7 Application: defense against hash flooding" would affect hash tables also. No, they are talking purely about the hash function there. But then in section 7 they act as like all hash tables are only implemented as the simpliest and worst of all, linked lists chaining. Only the worst hash tables are using this collision resolution scheme. It is slow and it is "insecure" by default. There is no need to add a pseudo-secure layer of siphash on top of it as it does not help. The idea how such a hash table would be attacked are from 2003. We are now in 2016, and people are using efficient sat solvers (first just z3 in python, then producing scripts which produce CNF, and then eventually cbmc which can use the siphash code verbatim), not the simple ad hoc attempts they are describing. Their analysis is partially sound, but the result and claim is completely wrong. We are just lucky that such hash flooding is uncool, as attacking an application service is nowadays much easier. You can read about it here almost every other day.

Rust uses a uselessly slow hash function while it was already protected by the balancing characteristics of robin hood. Therefore switching to siphash is doing more harm than good. It's using a 10-100x slower hash function which is completely useless. Count the average collisions before and after, and compare the time won there against the time lost in the hash function. It's a dramatic loss.

It wasn't a dramatic loss in python, no idea why. I guess they have some other grave mistake in their hash table. It is a dramatic loss in perl5. Even if their hash table is the worst of all open source projects. Well, at least they randomize the iterator now. Before they used a zero-invariant function (trivial to attack by adding \0), and didn't count collisions when fighting collision attacks.