Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Noob question, but why no use ints for PK, and UUIDs for a public_id field?




If you put an index on the UUID field (because you have an API where you can retrieve objects with UUID) you have kind of the same problem, at least in Postgres where a primary key index or a secondary index are more or less the same (to the point is perfectly valid in pgsql to not have any primary key defined for the table, because storage on disk is done trough an internal ID and the indexes, being primary or not, just reference to the rowId in memory). Plus the waste of space of having 2 indexes for the same table.

Of course this is not always the case that is bad, for example if you have a lot of relations you can have only one table where you have the UUID field (and thus expensive index), and then the relations could use the more efficient int key for relations (for example you have an user entity with both int and uuid keys, and user attribute references the user with the int key, of course at the expense of a join if you need to retrieve one user attribute when retrieving the user is not needed).


You can create hash indexes in Postgres, so the secondary index uuid seems workable:

https://www.postgresql.org/docs/current/hash-index.html


*edit: sorry, misread that. My answer is not valid to your question.

original answer: because if you dont come up with these ints randomly they are sequential which can cause many unwanted situations where people can guess valid IDs and deduce things from that data. See https://en.wikipedia.org/wiki/German_tank_problem


Hence the presumed implication behind the public_id field in GP's comment: anywhere identifiers are exposed, you use the public_id field, thereby preventing ID guessing while still retaining the benefits of ordered IDs where internal lookups are concerned.

Edit: just saw your edit, sounds like we're on the same page!


So We make things hard in the backend because of leaky abstractions? Doesn't make sense imo.

Decades of security vulnerabilities and compromises because of sequential/guessable PKs is (only!) part of the reason we're here. Miss an authorization check anywhere in the application and you're spoon-feeding entire tables to anyone with the inclination to ask for it.

I also think we can use a combination of a PID - persistent ID (I always thought it was public) and an auto-increment integer ID. Having a unique key helps when migrating data between systems or referencing a piece of data in a different system. Also, using serial IDs in URLs and APIs can reveal sensitive information, e.g. how many items there are in the database.

One of the benefits of UUIDs is that you can easily merge data coming from multiple databases. Auto-increments cause collisions.

The article mentions microservices, which can increase the likelihood of collisions in sequential incremental keys.

One more reason to stay away from microservices, if possible.


Always try to avoid having two services using the same DB. Only way I'd ever consider sharing a DB is if only one service will ever modify it and all others only read.

Good luck enforcing that :)

The 'collision' is two service classes both trying to use one db.

If you separate them (i.e. microservices) the they no longer try to use one db.


There is nothing stopping multiple microservices from using the same DB, so of course this will happen in practice.

Sometimes it might even be for a good reason.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: