Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I never understood the arguments against using using globally unique ids. For example how it somehow messes up indexes. I’m not a CS major but those are typically b-trees are they not? If you have a primary key whose generation is truly random such that each number is equally likely, then that b-tree is always going to be balanced.

Yes there are different flavors of generating them with their own pros and cons, but at the end of the day it’s just so much more elegant than some auto incrementing crap your database creates. But that is just semantic, you can always change the uuid algorithm for future keys. And honestly if you treat the uuid as some opaque entity (which you should), why not just pick the random one?

And I just thought of the argument that “but what if you want to sort the uuid…” say it’s used for a list of stories or something? Well, again… if you treat the uuid as opaque why would you sort it? You should be sorting on some other field like the date field or title or something. UUIDs are opaque, damn it. You don’t sort opaque data. “Well they get clustered weird” say people. Why are you clustering on a random opaque key? If you need certain data to be clustered, then do it on the right key (user_id field did your data was to be clustered by user, say)

Letting the client generate the primary keys is really liberating. Not having to care about PK collisions or leaking information via auto incrementing numbers is great!

In my opinion uuid isn’t used enough!





> If you have a primary key whose generation is truly random such that each number is equally likely, then that b-tree is always going to be balanced.

Balanced and uniformly scattered. A random index means fetching a random page for every item. Fine if your access patterns are truly random, but that's rarely the case.

> Why are you clustering on a random opaque key?

InnoDB clusters by the PK if there is one, and that can't be changed (if you don't have a PK, you have some options, but let's assume you have one). MSSQL behaves similarly, but you can override it. If your PK is random, your clustering will be too. In Postgres, you'll just get fragmented indexes, which isn't quite as bad, but still slows down vacuum. Whether that actually becomes a problem is also going to depend on access patterns.

One shouldn't immediately freak out over having a random PK, but should definitely at least be aware of the potential degradation they might cause.


I feel, honestly, like while you are indeed correct for most cases it’s absolutely fine to use some flavor of uuid. I feel like the benefits outweighs the cost in most cases.

Sure, and for many cases, uuidv7 is that flavor. It just comes with a timestamp, which may or may not be an issue. It isn't an issue for me, which is why I use it myself.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: